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Time to open up 


If scientists want the public to continue to volunteer for research projects, they must learn to be a lot 
more forthcoming about the ways in which the information they garner will be used. 


entirely happy with the rules that govern and preserve research 

ethics. Scientists working on an experiment or a clinical trial can 
find that the complex web of regulations that they must follow is a con- 
fusing burden. These rules, they feel, simply add time, cost and compli- 
cation to their studies. 

Institutions are wary of falling foul of regulations that even highly 
trained staff are hard-pressed to interpret. And research partici- 
pants — the volunteers without whom research such as clinical trials 
would not be possible — often don’t understand all the implications 
contained in the voluminous documents that they must sign before they 
take part in the research. Although these forms are ostensibly designed 
to protect participants, they are more often directed at protecting insti- 
tutions from liability. 

The notion of informed consent has become especially problematic 
in this modern age of ‘big data. The need to seek and obtain permis- 
sion from volunteers can be traced back to research-ethics principles 
introduced in response to revelations about the way Nazi doctors tor- 
tured people during the Second World War. Decades ago, researchers 
could not glean much information from a piece of stored tissue. But 
now, it is no longer possible to explain to a research subject all the pos- 
sible ways in which their offered data might be used in the future, and 
it is becoming less feasible to guarantee their privacy under (correct) 
rules that mandate the deposition of research data in public databanks. 
Some large data-collection projects have therefore adopted very broad 
informed-consent provisions, in which research participants simply 
agree that their data may be used for a wide and unspecified variety 
of purposes. But as potential volunteers become savvier to the value 
and vulnerability of their personal data, they may become less receptive 
to this approach. Indeed, as our feature on page 312 shows, this may 
already be happening. 

Technology and some creative thinking should be able to provide a 
solution to this problem. In an era when people can control who sees 
what types of personal information on their Facebook page, program- 
mers should be able to design similar tools to give research volunteers 
a degree of influence over who uses their data and for which types of 
research. Experiments along these lines are now being tried by the com- 
pany Private Access, based in Irvine, California. 

Scholars of ethics and law are also trying to think of new models for 
informed consent that could accommodate the needs of both research- 
ers and research participants. For instance, a group based at Duke 
University in Durham, North Carolina, and the University of North 
Carolina at Chapel Hill has proposed remodelling consent provisions 
based on laws that protect trade secrets. In its favour, this model would 
provide the option of giving donors something in return for handing 
over their samples to a large data collection, such as a biobank. It would 
ask the donors what they want in return for the information embedded 
in the samples, such as financial compensation (which many already 


ae biomedical researchers who work with human volunteers are 


receive) or a say in the types of research their information is used for. 

Other researchers have proposed reforms that head in the opposite 
direction — for instance, by relaxing the rules on consent for research 
involving ‘de-identified’ specimens, which are stripped of information 
that could link them back to their source. 

This approach may seem attractive because it would seem to lessen 
cost and burden on the researchers, and could make the most of cur- 
rently under-exploited medical records and data samples. Extreme 

caution is needed, however. It would take only 


“Scholars of one high-profile case of an unwanted data disclo- 
ethics and sure to undermine support for research — and 
lawaretrying not just for the field in question. Researchers 
to think of need only look to the backlash against newborn 
new models screening in the United States and tissue research 
for informed in the United Kingdom to realize how failing to 
consent.” obtain consent can set back their cause. 


As this journal has argued previously, a more 
appropriate solution to the conundrum of informed consent is to intro- 
duce greater openness to the process. 

Scientists could agree to give results containing medically relevant 
findings back to participants in genetic studies, such as information 
about their health. Although some have argued that this approach, as 
well as patient-controlled approaches, will add time, cost and complica- 
tion to studies, no one really knows if this is the case. Such concerns must 
not be allowed to derail the idea. 

Most large studies are funded by taxpayers or patient-advocacy 
groups. Researchers are therefore obliged to listen when patients and 
members of the public argue that they want to have more information 
— not less — to ensure that they agree to offer their continued participa- 
tion in research. It might not make everybody happy, but it should keep 
everybody involved. = 


Renewed vigour 


Stem-cell researchers must engage with 
politicians to keep their work alive in Europe. 


more under scrutiny in Europe. In a situation that will stir 

memories of the acrimonious debates of 2006, legislators must 

again assess whether this kind of work should still be funded under 

the forthcoming €80-billion (US$100-billion) Horizon 2020 research 
programme. 

Now, as then, opinion is split. Some countries and some members of 


Ree involving human embryonic stem (ES) cells is once 
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the European Parliament are in favour of the research, recognizing the 
long-term potential to treat debilitating disease. Others maintain that 
it is immoral to exploit a technique that uses human embryos — even 
spare embryos from in vitro fertilization clinics that would other- 
wise be destroyed, and from which nearly all experimental human 
ES cell lines are derived. The 2006 debate was resolved with both the 
European Parliament and Council agreeing to fund such research, 
provided that it didn't involve the creation of new human ES cells, 
and provided that it was not carried out in those countries — such as 
Germany — whose national law banned it. 

Stem-cell researchers around the world breathed a sigh of relief, 
knowing how a decision in Europe could also influence funding 
decisions elsewhere. The outcome of the present debate offers simi- 
lar influence. But there are already signs that some members of the 
European Parliament will once again try to outlaw funding of research 
involving human ES cells. 

Rather than wait for these views to gain unchecked momentum, a 
group including UK research-funding bodies the Wellcome Trust, the 
Medical Research Council, the British Heart Foundation and Parkin- 
son’s UK last week issued a joint statement in support of the research, 
explaining the benefits of the work and the rationale to include it in 
Horizon 2020. 

It is a wise move that should help to anchor the coming debate to 
reality. Biology is complicated, which makes it easy for politicians to 
mislead the public and colleagues — intentionally or unintention- 
ally — in emotive areas. The general public, and politicians, have 
every right to question whether the ends justify the means used 
by medical researchers. But they also have the right to reliable 
information. 

The statement outlines the remarkable progress that stem-cell 
researchers have made since 2006, which has led in the past 12 months 
to the first approvals for clinical trials of potential therapies involving 
human ES cells — for a type of blindness called macular degenera- 
tion and for spinal-cord injury. In addition, scientists have discovered 


how to force adult cells back to an embryonic-like state. The resulting 
induced pluripotent stem (iPS) cells can then be grown into particular 
cell types and used to understand mechanisms of disease at the cellular 
level. In the long term, they may also be useful for therapy. 

The most insidious claim of those who oppose human ES cell 
research holds that iPS cells, which can be derived from a particular 
patient’s own cells and are ethically unburdened, eliminate the need for 

human ES cells in research and therapy. That 


“The general concept sounds appealing, but it is simply not 
public, and true. Scientists understand little of the dif- 
politicians, ferences between the two sorts of stem cells 
haveeveryright — and itwill take years of comparative work to 
to question do so. 

whether the One particular event that makes biomedi- 
ends justify the cal researchers worry is a decision taken last 
means used year by the European Court of Justice. In 
by medical October it ruled that patenting of inven- 


tions involving human ES cells was illegal 
because it was immoral — and as a conse- 
quence, human ES cell research must also be immoral (see Nature 
480, 310-312; 2011). Nature condemned this ruling as being beyond 
the court's juridical and technical competencies (see Nature 480, 291- 
292; 2011). But members of the European Parliament who oppose 
human ES cell research stealthily inserted a reference to it during an 
unrelated resolution on broad patenting of essential biological pro- 
cesses in animal and plant breeding, which was adopted on 10 May. 

Horizon 2020 has to be approved by the European Parliament and 
Council by mid-2013 so that first calls for proposals can be launched 
at the start of 2014. The UK research funding agencies’ statement is a 
good start to a continuous campaign of education and transparency 
that stem-cell researchers from all interested European countries must 
maintain for the next year. Just as cell-culture medium needs to be 
renewed to keep its cargo alive, a political message has to be constantly 
renewed if it is to stay alive in political minds. m 


researchers.” 


Serious questions 


Nature Publishing Group’s reader survey on 
lab-safety practices needs your input. 


naked flames and nasty microbes abound. The white laboratory 

coat, a long-standing symbol of science to many outsiders, offers 
some protection against these implicit threats. White coats are ubiq- 
uitous in fictional labs in films and on television, but how many lab 
scientists actually wear one? And, perhaps more importantly, how 
many should do, but don’t? Are you wearing one right now? Are your 
colleagues? Does it matter? Would you tell anybody if it did? And, if 
nota lab coat, what about those protective goggles? They get so hot 
in summer, dont they? Is it really that big a deal if you leave them on 
the hook just this once? 

It is easy for scientists, especially those who have been around fora 
while and so tend to be in charge, to take a cavalier attitude to safety, 
purely because science is so much safer now than it was when they 
began. And although it is true that laboratories and lab culture have 
improved since the reckless days of the 1950s and 1960s, accidents 
still happen. Sometimes, these accidents are fatal. Laboratories do still 
kill people. 

In an Editorial on the subject last April (Nature 472, 259; 2011), 
prompted by the death of physics and astronomy undergraduate 
student Michele Dufault in a workshop at Yale University in New 
Haven, Connecticut, this publication warned against complacency 


. cientific laboratories are dangerous places. Noxious chemicals, 
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when it comes to lab safety. A common complaint among environ- 
mental health and safety officers in universities and elsewhere, we 
noted, is that “there is no good source of consistent data on labora- 
tory accidents, which could be studied to determine effective safety 
interventions”. That the working environment for scientists is safer 
now than in times past is less important than whether it is as safe 
now as it could, or should, be — and there is at present no way to say 
for sure that it is. 


“It is easy for Nature Publishing Group (NPG) has now 
scientists to joined with the University of California, 
take a cavalier Los Angeles, and the software firm Bio- 
attitude to RAFT to try to fill in some of the blanks. 


safety. a (BioRAFT, based in Cambridge, Massachu- 

setts, has investment from Digital Science, 
owned by NPG’s parent company, Macmillan.) Together, the three 
have launched an online survey of international laboratory safety 
and working culture. Some readers will already have received invi- 
tations to participate, but everyone else is welcome, too: the survey 
can be found at go.nature.com/7Idjli. It should take about 15 min- 
utes to complete and is anonymous — although there is an option 
to leave an e-mail address for follow-up questions. The organizers 
hope that tens of thousands of working scientists will respond to 
questions about the environments they work in and the attitudes 
that they and their colleagues have to health and safety regulations. 
The survey also addresses research practice, including how many 
people regularly work alone in a lab, and how 
often; training provision; and whether scien- 
tists feel able to raise concerns about safety. 
Please take the survey. Someone, some day, 
will benefit. m 
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WORLD VIEW jernssicoroo 


H5N1 flu virus has focused on biosecurity concerns. It is easy 

to get the impression that this debate has created a clear split 
between a scientific community that wants the research to proceed and 
the results to be published and a biosecurity community that doesnt. 

Asa member of this biosecurity community for more than 30 years 
— Iwas special adviser to the chairman of the United Nations weapons 
inspectors in Iraq and covered chemical and biological disarmament 
with the UK Foreign Office in both London and Geneva, Switzerland — 
I believe this to be a false dichotomy. The research should be published 
in full, as it will be this week. 

In fact, I will go further and say that the whole concept of dual-use 
biological research that is ‘of concer is flawed. It is a dangerous distrac- 
tion, an inappropriate hangover from nuclear-threat analysis. Almost all 
biological knowledge can be either misused or applied for good. 

Those concerned about publishing full details of 
the mutant-flu work say that they fear the research 
will be misused to develop more-effective biological 
weapons. But who would want to use a live, highly 
transmissible, virulent organism as a weapon, and 
to what purpose? And would censorship stop them? 

Although such a weapon would strike terror and 
harm economies, its impacts would be uncontrol- 
lable, indiscriminate and unpredictable. And com- 
pared with conventional weapons, it would be slow 
to take effect and relatively easy to combat, through 
prompt vaccination and treatment. 

That severely restricts the number of potential 
users. An uncontrollable weapon is unsuited to tar- 
geted attacks and its use would heap opprobrium on 
the user. And if insurgents or terrorists unleashed a 
catastrophic and indiscriminate attack on civilians, it would devastate 
sympathy for their cause. 

The only groups who might logically consider using such a weapon 
are those for whom humans are the problem, such as environmental 
extremists and animal-rights activists, or apocalyptic sects, such as the 
Japanese terrorist cult Aum Shinrikyo, which released sarin gas in the 
Tokyo underground in 1995. Then there are those who do not care about 
casualties, such as a state or a regime that believes it faces imminent 
existential threat, or suicide fighters. 

Censorship of the H5N1 papers would not have kept the genie in 
the bottle. Suppressing such papers or limiting access to their findings 
might even encourage proliferation by drawing attention to the risks 
and by provoking those researchers denied access to the results to seek 
to replicate them. 


r The recent controversy over research into mutated versions of the 


Can we prevent proliferation by controlling NATURE.COM 
research? Certainly, researchers, institutional  Formore on mutant 
review boards and funders must consider the __ flu, see: 
implications of proposed research from the _ go.lature.com/mbhmibi 
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WILL ULTIMATELY 

DETER THE USE OF 
BIOLOGICAL 


WEAPONS. 


Do not censor science in 
the name of biosecurity 


Security officials should not be concerned about the publication of mutant-flu 
research, says bio-weapons expert Tim Trevan. 


outset and implement a full biosafety and biosecurity plan. Major 
efforts have been made in this area. But to deny funding to projects 
with clear scientific or public-health value, even if they have some 
biosecurity risks, will drive research to undesirable sources of funding 
and prevent valuable research from being done. 

If the knowledge and the science cannot be contained, then what 
about access to the materials and equipment required to turn research 
results into weapons? The direction in which technology and scientific 
services are heading does not bode well for controlling proliferation 
in this way. Companies already make genes for mail order. Free gene- 
design software exists. DNA printers will probably be on lab bench tops 
within the decade. But it cannot be morally or politically defensible to 
prevent wide distribution of tools that are indispensable to public health 
and basic research. 

Warfare and terrorism are not the only biological risks that confront 
humanity. There is an entire spectrum of risks, from 
natural and accidental to deliberate. We are mostly 
helpless to prevent the periodic creation of new 
deadly diseases. We know that we face regular flu 
pandemics and that some will be particularly deadly. 

An analysis of the effect of carrying out and pub- 
lishing such research must compare two factors. 
The first — the cost — is the risk that publication 
will lead to deliberate release, multiplied by the 
impact of the release, multiplied by the frequency 
of release. The second — the benefit — is the possi- 
ble reduction in the 250,000-500, 000 annual deaths 
worldwide due to seasonal flu and the more than 
12 million lives lost annually to other infectious dis- 
eases, among other public-health benefits. 

Precise calculation is not possible, but the 
evidence strongly suggests that the increase in risk is quite small. The 
known benefits of addressing public-health challenges from nature will 
almost always far outweigh the potential and unknowable increased 
risk of misuse. 

The bigger argument in favour of continued research into viral 
transmissibility and pathogenicity (the focus of the mutant-flu work) is 
that it will ultimately deter the use of biological weapons. 

The best strategy to stop biological attacks is to make biological 
weapons unattractive by making preparedness and responses so effec- 
tive that the consequences are no worse than those ofa train wreck. 
Increased understanding of transmissibility and pathogenicity will ena- 
ble countries to identify threats earlier, develop better vaccines, produce 
them more quickly and develop broad-spectrum defences to diseases. 
This will protect against both nature and warfare. m 


Tim Trevan is executive director of the International Council for the 
Life Sciences in McLean, Virginia. 
e-mail: trevan@iclscharter.org 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


LQiQl 


Sequencing 
tracks outbreak 


Whole-genome sequencing 
has enabled researchers to 
confirm which patients in a 
hospital were affected by a 
particular outbreak of an 
antibiotic-resistant pathogen 
— ata cost and ona timescale 
that are clinically relevant. 

Sharon Peacock at the 
University of Cambridge, 

UK, and her team sequenced 
isolates of methicillin-resistant 
Staphylococcus aureus (MRSA) 
from 14 hospitalized patients, 
half of whom had become 
carriers of a specific strain of 
MRSA during an outbreak in 
the neonatal intensive care unit. 
In about 1.5 days and at a cost 
of some US$150 per isolate, the 
authors generated sequences 
that showed a clear genetic 
distinction between outbreak 
and non-outbreak isolates. 

The researchers predict that, 
as whole-genome sequencing 
costs and turnaround times fall, 
this will become a standard tool 
for controlling the spread of 
dangerous pathogens. 

N. Engl. J. Med. 366, 2267-2275 
(2012) 


ANIMAL BEHAVIOUR 


Castration boosts 
spider stamina 


The male orb-web nephilid 
spider often castrates himself 
during sex, reducing his body 
weight by up to 9%. This could 
increase his endurance when 
defending his mate from 
competitors. 


ECOLOGY 


Bat culls do not stop spread of rabies 


Culling adult vampire bats might not be an 
effective means of reducing outbreaks of rabies 


in humans and livestock. 


Daniel Streicker at the University of Georgia 
in Athens and his colleagues tested common 
vampire bats (Desmodus rotundus; pictured) 
sampled from 20 colonies across Peru between 
2007 and 2010 for exposure to the rabies virus. 
Exposure prevalence ranged from 3% to 28% 
and was highest in immature bats. Culling 


Daigqin Liat the National 
University of Singapore 
and his colleagues partly or 
fully castrated male spiders 
(Nephilengys malabarensis; 
fully castrated pictured right) 
and then assessed their stamina 
by repeatedly poking them 
with a paintbrush to goad them 
into moving. 

Spiders that retained 
their genitals, or palps, 
quickly became exhausted, 
whereas the much lighter 
eunuchs showed 80% greater 
locomotor endurance. Spiders 
with one remaining palp kept 
going for 32% longer than 
intact males (left). 
Biol. Lett. http://dx.doi. 
org/10.1098/rsbl.2012.0285 
(2012) 
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during the test period did not reduce the 
probability of exposure to rabies. 


Adult bats might be developing immunity to 


IMMUNOLOGY 


Good microbes 
fight bad 


Microbes living in the guts and 
airways of mammals help their 
hosts to fend off pathogens. 
John Wherry and David 
Artis at the University of 
Pennsylvania in Philadelphia 
and their team treated mice 
with antibiotics to kill off their 
gut microbes and then infected 
them with an influenza virus. 
The mice lost more weight 
and were more likely to die 
than those that did not receive 
antibiotics. The antibiotic-fed 
mice also mounted a reduced 
immune response to the virus. 
The authors suggest that the 
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rabies after repeated exposure to the virus, the 
authors suggest, so culling could increase virus 
transmission in part because it targets immune 
adults and leaves behind young bats that are 
more likely to carry and to transmit the disease. 
Proc. R. Soc. B http://dx.doi.org/10.1098/ 
rspb.2012.0538 (2012) 


bacteria living in mammals 
prime the immune system to 
respond to pathogens, and. 
say that harnessing this ability 
could aid in the treatment of 
viral infections in humans. 
Immunity http:// 
dx.doi.org/10.1016/j. 
immuni.2012.04.011 (2012) 


MATERIALS 


Graphene can 
desalinate water 


Salt could be separated from 
water using an ultrathin 
porous membrane made up 
of a single sheet of carbon 
atoms, a computational study 
suggests. 

David Cohen-Tanugi 


G. ZIESLER/GETTY 


and Jeffrey Grossman at the 
Massachusetts Institute of 
Technology in Cambridge 
simulated the molecular 
interactions between graphene, 
salt and water, and looked at the 
effects of different pore sizes on 
salt filtration. The researchers 
found that graphene with a 
pore size of 0.7-0.9 nanometres 
should stop the passage of salt 
while letting water through. 
Attaching hydroxyl groups to 
the edges of the graphene pores 
would boost water flow-rate 
but impair salt rejection — 
because the chemical groups 
can swap with water molecules 
surrounding the salt ions. By 
contrast, attaching hydrogen 
atoms to the pores would 
improve filtration. 

Graphene promises to be 
many times more permeable 
to water than conventional 
membranes used in 
desalination, the authors say. 
Nano Lett. http://dx.doi. 
org/10.1021/nl13012853 (2012) 


MICROBIOLOGY 


Bacterial border 
defence 


Insects combat pathogenic 
bacteria by producing a 
polymer called melanin 

and depositing it onto the 
microbe’s surface. But one 
bacterial species has a weapon 
of its own — a cell-surface 
molecule that inhibits insects’ 
melanin-producing enzymes. 

Jon Clardy at Harvard 
Medical School in Boston, 
Massachusetts, and his team 
pinpointed the molecule, 
rhabduscin, on the surface of 
the bacterium Xenorhabdus 
nematophila. Nanomolar 
levels of the chemical blocked 
the activity of a melanin- 
producing enzyme from 
waxmoth larvae. 

X. nematophila that lack 
arhabduscin-producing 
enzyme were less effective at 
killing the larvae than were 
normal bacteria. 

Genes that encode enzymes 
involved in rhabduscin 
production are also found 
in the pathogen that causes 
cholera, Vibrio cholerae. A 
similar defence mechanism 


might exist in this bacterium, 
the authors speculate. 

Proc. Natl Acad. Sci. USA 
http://dx.doi.org/10.1073/ 
pnas.1201160109 (2012) 


APPLIED PHYSICS 


Terahertz-wave 
detector 


Devices that emit and detect 
radiation in the terahertz 
part of the spectrum — 
between the infrared and 
microwave regions — have 
potential applications in 
imaging, including in medical 
diagnostics. Researchers have 
developed a compact and 
efficient terahertz detector that 
works at room temperature. 

Miriam Vitiello of 
the National Enterprise 
for nanoScience and 
nanoTechnology in Pisa, 
Italy, and her team built their 
detector out of indium arsenide 
nanowires 1.5 micrometres 
long and 30 nanometres in 
diameter. Radiation from 
a 1.5-terahertz emitter was 
funnelled to the detector froma 
bow-tie-shaped antenna. 

The researchers suggest that 
the detector could be tuned 
to respond to even higher 
frequencies, and could be built 
into multi-pixel arrays, which 
are ideal for detectors. 
Appl. Phys. Lett. 100, 241101 
(2012) 


STEM CELLS 


Human eye parts 
inadish 


Retinal cells made from human 
embryonic stem cells could one 
day be used to help restore sight 
in people with certain forms of 

blindness. 

Yoshiki Sasai at the RIKEN 
Center for Developmental 
Biology in Kobe, Japan, and his 
colleagues used the stem cells 
to generate retinal epithelial 
cells, which are precursors 
for the retina. After a few 
weeks, a single layer of these 
cells spontaneously formed 
a curved structure called an 
optic cup. After several more 
weeks, the cup developed into 
a multilayered structure with 


RESEARCH HIGHLIGHTS BUSA aa¢ 


Melting ice behind Arctic warming 


3 HIGHLY REAI 


Sea-ice loss seems to be the main culprit 
behind the rapid surface warming in the 


Arctic, but it contributes next to nothing to 
the heating of the area’s lower atmosphere. 

The climate is warming faster in the Arctic than elsewhere. 
Using simulations generated by two atmospheric circulation 
models, James Screen at the University of Melbourne in 
Australia and his team disentangled and quantified local and 
remote contributions to this ‘Arctic amplification. 

Sea-ice retreat and related changes in local sea surface 
temperature are the main drivers of surface-level warming, 
the simulations suggest. By contrast, most lower atmosphere 
warming seems to result from increased atmospheric heat 
transport from lower latitudes. Apart from in July and August, 
greenhouse gases and aerosols make little direct contribution 
to Arctic warming, the authors note. 

Geophys. Res. Lett. http://dx.doi.org/10.1029/2012GL051598 


(2012) 


multiple retinal cell types, 
including light-sensitive 
photoreceptors. 

The researchers also devised 
a method for cryopreserving 
the retinal tissue. They foresee 
that stored material could 
ultimately be triggered to 
develop into specific cell types 
that can be grafted onto a 
patient’s retina. 
Cell Stem Cell 10, 771-785 (2012) 
For a longer story on this research, 
see go.nature.com/ibgaqa 


One mummy but 
three people 


A mummy found ata site off 
the coast of Scotland consists 
of remains from at least three 
individuals. 

Human skeletons dating 
from 1400-1100 Bc were 
previously unearthed at the 
Cladh Hallan settlement. A 
male skeleton was identified 
as acomposite of multiple 
individuals on the basis 
of isotope data, but the 
status of a female mummy 
(pictured) was less certain. 
Terry Brown at the University 
of Manchester, UK, and his 
colleagues extracted DNA 


from bones of the jaw, right 
arm and right leg of the female 
remains. After excluding 
possible contamination, the 
researchers found that all three 
body parts came from different 
individuals. DNA analysis of 
the skull was inconclusive. 

The team suggests that the 
remains at Cladh Hallan were 
deliberately merged, perhaps to 
symbolically combine different 
ancestries into one lineage. 

J. Archaeol. Sci. http://dx.doi. 
org/10.1016/j,jas.2012.04.030 
(2012) 
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Marine reserves 


Australia’s government has 
unveiled its final plans to 
create what will be the world’s 
largest network of marine 
reserves, covering 3.1 million 
square kilometres of ocean 
along the nation’s coasts. 
Researchers were worried 

by draft proposals last year 
(see Nature 480, 14-15; 
2011), but the plans released 
on 14 June addressed their 
concerns by including small 
but significant extensions 

to the reserve boundaries, 

to increase protection to 
regions such as the Coral Sea. 
The government expects the 
reserves to become law at the 
end of this year, after a public 
consultation. See go.nature. 
com/vnq8fw for more. 


US research thrift 


Research universities in 

the United States need to 
become more efficient and 
more productive, a 14 June 
report from the US National 
Academies urges. The report, 
requested by Congress in 
2009, recommends ten fixes 

to maintain quality, including 
improving cost-effectiveness 
by sharing expensive research 
equipment and facilities, 

and encouraging greater 
collaboration. Federal funding 
has flattened or declined, and 
state funding has dropped by 
25% on average and up to 50% 
in some cases, the report notes. 
See go.nature.com/6ppdtj for 
more. 


Fisheries reform 


Europe’s attempts to overhaul 
its much-criticized fisheries 
policy tooka step forward 

at a meeting in Luxembourg 
on 12 June. Ministers on 

the council of the European 
Union agreed to a partial ban 
on the practice of throwing 
away unwanted by-catch, 
and pledged to return fish 


The news in brief 


China celebrates space-station success 


Ina milestone for China's space programme, 
three astronauts boarded the country’s orbiting 
Tiangong 1 space module on 18 June. Their flight 
on the Shenzhou 9 craft was the country’s fourth 
manned space launch, but is the first of a series 


stocks to levels that produce 
the maximum sustainable 
yield by 2020. But critics 
said that these agreements 
were insufficient, effectively 
allowing Europe's fisheries 
to continue on their current 
unsustainable path. Any 
reforms still need to be 
negotiated with the European 
Parliament. See go.nature. 
com/tn8oea for more. 


Ocean acidity 

The International Atomic 
Energy Agency (IAEA), based 
in Vienna, is to create a centre 
to facilitate and communicate 
research into ocean 
acidification, it announced 

on 18 June. The centre, to 

be launched this summer at 
the IAEA’s Environmental 
Laboratories in Monaco, is 
part of the agency's remit 

to support peaceful uses of 
nuclear technology. Tracing of 
the radioisotope calcium-45, 
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for example, allows sensitive 
measurements of calcium 
carbonate uptake in the 
skeletons of marine organisms, 


affected by acidifying oceans. 
German excellence 


Thirty-nine universities have 
each won a share of €2.4 billion 
(US$3 billion) in the second 
round of Germany’s Excellence 
Initiative — a competition 
among the country’s 
institutions to win elite status 
and funding for specialized 
research clusters and graduate 
schools for 2012-17. The 
decisions, announced on 

15 June, spread cash across 

99 separate projects. 


Biomedical careers 
The US National Institutes of 
Health on 14 June received 
two key reports on how to 
improve prospects for young 
biomedical scientists. One 
suggests raising minimum 
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of missions in efforts to build a manned space 
station, the Tiangong (“Heavenly Palace’), by 
2020 (see Nature 473, 14-15; 2011). The mission 
carried China’s first female astronaut, Liu Yang. 
See go.nature.com/f5qkka for more. 


salaries for postdocs, among 
other recommendations, 
while the other addresses the 
plight of under-represented 
minorities in biomedical 
science. See page 304 for more. 


Open access 


A report commissioned by 
the UK government has 

laid out how the country 
should accelerate a shift to 
open-access publishing. The 
19 June report, chaired by 
sociologist Janet Finch at the 
University of Manchester, UK, 
recommended that researchers 
should prepare to pay up front 
for the cost of publishing their 
research, a route known as 
gold open access. See page 302 
for more. 


Unethical research 
A US court has dismissed 

a lawsuit by Guatemalan 
citizens against US officials 
over American researchers 
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who intentionally infected 
Guatemalans with sexually 
transmitted diseases in 

the 1940s (see Nature 482, 
148-152; 2012). A presidential 
bioethics commission has 
already issued a series of 
reports condemning the 
experiments, but on 13 June 
a judge ruled that the 
government was immune 
to prosecution in the case. 
Lawyers said they would 
appeal. See go.nature. 
com/1ffhél for more. 


Mikovits theft case 


Chronic-fatigue-syndrome 
researcher Judy Mikovits is no 
longer facing criminal charges 
for stealing lab notebooks, 
computers and other material 
from her former employer, the 
Whittemore Peterson Institute 
for Neuro-Immune Disease 

in Reno, Nevada. A Nevada 
district attorney dropped the 
charges last week, although 
Mikovits (known for her now- 
retracted work linking chronic 
fatigue syndrome to a virus) 
still faces a civil suit from the 
institute. See go.nature.com/ 
ukgegy for more. 


Nobel chemist dies 


the Brazilian Supreme Court 
of Justice said it should apply 
nationwide. See go.nature. 
com/mowmyh for more. 


Fossil smuggling 

A nearly complete 
tyrannosaurid fossil that sold 
for US$1 million was illegally 
smuggled out of Mongolia, 
according to a civil complaint 
seeking the fossil’s return, 
which was filed on 18 June by 
the US Department of Justice. 
Despite a restraining order 
obtained by the Mongolian 
government from a Texas 
court to prevent the sale or 
transfer of the fossil, Heritage 
Auctions based in Dallas, 
Texas, auctioned off the 
Tarbosaurus bataar fossil in 


giant Monsanto in St Louis. 

He won the Nobel with Ryoji 
Noyori and Barry Sharpless for 
his work on chemical syntheses 
that selectively create one 

of two mirror-image forms 
(enantiomers) of a molecule. 


GM soya I evy New York last month. 

The biotechnology giant 

Monsanto is one step closer | ——sRESEARCH Cd 
to losing billions of dollars in . ; 
revenues from its genetically Diesel cancer links 


modified (GM) Roundup 
Ready soya beans in Brazil. 
Monsanto, headquartered in 
St Louis, Missouri, levies a 
charge on Brazilian farmers 
who grow soya beans that 
turn out to be GM. Farmers 
say it is impossible to avoid 
growing GM soya because of 
contamination, and in April 
they won a challenge in the 


Diesel exhaust is carcinogenic 
to humans, the International 
Agency for Research on Cancer 
(IARC) declared on 12 June 
after a meeting in Lyons, 
France. Diesel emissions 

were previously classed as 
‘probably’ carcinogenic; the 
latest conclusion followed 

the publication in March ofa 
long-delayed US government 


Organic chemist William 
Knowles (pictured), who 
shared the 2001 Nobel Prize 

in Chemistry, died on 13 June 
aged 95. Knowles worked for 
four decades at the agricultural 


state of Rio Grande do Sul, 
where a judge ruled that the 
company’s levy was illegal. The 
ruling is currently suspended, 
pending consideration by a 
higher court. But on 12 June, 


study showing how exposure 
to diesel exhaust increases the 
risk of lung cancer in miners. 
The IARC’s pronouncement is 
purely scientific; it will be up 
to national regulatory agencies 


TREND WATCH 


The United States, which 

two years ago yielded the 

title of host to the world’s top 
supercomputer — bested first by 
China and then by Japan — has 
roared back into the lead in this 
month’ list of the world’s top 500 
supercomputers, with the Sequoia 
machine at Lawrence Livermore 
National Laboratory in California. 
It also claimed third place with 

a supercomputer at Argonne 
National Laboratory in Illinois. 
Italy made its top-ten debut with a 
system at the CINECA computing 
centre near Bologna. 


WORLD’S FASTEST COMPUTERS 


The United States reclaims the top spot for first time in two years with a 
supercomputer at Lawrence Livermore National Laboratory in California. 


@ United States — Japan @ Germany @ China @ltaly @ France 


SEVEN DAYS | THIS WEEK | 


24 JUNE-6 JULY 
In St Petersburg, 
Russia, the United 
Nations Educational, 
Scientific and Cultural 
Organization’s World 
Heritage Committee 
meets to discuss the 
state of conservation 
sites including 
Australia’s Great 
Barrier Reef. 


whe36-russia2012.ru 


26-28 JUNE 

In Washington, Seattle, 
marine scientists plan 
out an international 
network to monitor 
the acidification of the 
oceans. 
go.nature.com/lopgt6 


to decide how to proceed. See 
go.nature.com/dddpaz for 
more. 


Space X-rays 
NASA’ NuSTAR telescope, 
which will examine high- 
energy X-rays produced at 
the thresholds of black holes 
(see Nature 483, 255; 2012), 
was launched into low- 
Earth orbit on 13 June. The 
low-cost mission is one of 
only a few available to X-ray 
astronomers. See go.nature. 
com/dcye8k for more. 


CORRECTIONS 

The story ‘Nobel laureate 
dies’ (Nature 486, 11; 
2012) should have said 
that Andrew Huxley did his 
award-winning work at the 
Laboratory of the Marine 
Biological Association in 
Plymouth, not the Plymouth 
Marine Laboratory. And 
‘Piezonuclear row’ (Nature 
486, 162; 2012) gave the 
wrong name for INRiM: it 
is the National Institute for 
Metrological Research. 


> NATURE.COM 
For daily news updates see: 
WwW.nature.com/news 
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Luc Montagnier has controversial scientific views that 
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may harm Cameroon’s AIDS institute, allege critics. 


Nobel fight over 
African HIV centre 


Laureates question choice of interim scientific director. 


BY DECLAN BUTLER 
fledgling AIDS research centre in 
A Cameroon, already struggling to find 
a scientific leader, is now facing insur- 
rection from an unlikely quarter: a group of 
35 Nobel prizewinners. 

The laureates are calling for the centre's 
interim scientific director, fellow prizewin- 
ner Luc Montagnier, to be removed from 
the part-time post. Observers say that unless 
the leadership crisis is resolved quickly and 
decisively, it could harm the prospects of the 
Chantal Biya International Reference Centre 
(CIRCB) in Yaoundé. 


The centre has a comprehensive AIDS 
research and health-care programme, in par- 
ticular testing and treating newborn babies to 
reduce maternal transmission of HIV. It is the 
only research institution in central Africa with 
the technology and expertise to monitor peo- 
ple with HIV thoroughly, and one of the few 
African sources of hard data about the spread 
of the disease. It has an annual budget of about 
US$1 million, an array of international collab- 
orations and around 20 local staff members, 
most of whom trained abroad. 

Nature has learned that the Nobel laure- 
ates wrote on 9 June to Paul Biya, president of 
Cameroon, asking him to reconsider 
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Montagnier’s appointment. Montagnier, head of 
the World Foundation for AIDS Research and 
Prevention in Paris, shared the 2008 Nobel Prize 
in Physiology or Medicine for discovering HIV. 

The laureates argue that his embrace of theo- 
ries that are far from the scientific mainstream, 
as well as what they claim are anti-vaccination 
views, risk hurting the CIRCB’s research, 
health-care programme and reputation. Mon- 
tagnier has suggested, for example, that water 
can retain a ‘memory’ of pathogens that are 
no longer present'; that the DNA sequences 
of pathogens emit electromagnetic waves that 
could be used to diagnose disease”’; and that 
stimulating the immune system with antioxi- 
dants and nutritional supplements may help 
people to fight off AIDS*. 


HIGH-PROFILE OPPOSITION 

The letter was coordinated by Richard Roberts, 
a Nobel-prizewinning molecular biologist and 
chief scientific officer of New England Biolabs 
in Ipswich, Massachusetts, who also wrote per- 
sonally to Biya on 4 June, to resign from the 
CIRCB’s scientific board. Roberts says he is 
concerned that Montagnier plans to pursue his 
unorthodox research at the centre. Several other 
board members have also resigned. 

Robert Gallo, head of the Institute of Human 
Virology at the University of Maryland, Balti- 
more, who had battled with Montagnier over 
which of them had discovered HIV, has also 
entered the fray. On 4 June, Gallo wrote to Biya 
expressing concerns similar to those of the 
Nobel laureates and informing Biya that his 
institute, a founding sponsor of the CIRCB, was 
immediately severing its links with the centre. 

Montagnier deplores what he describes as 
“ad hominem attacks” and “plain lies”, and 
says that there is an “ignominious campaign” 
against him and his group. He says that his- 
tory is full of pioneers whose ideas were at 
first given a chilly reception by a conservative 
research community. “I believe this is hap- 
pening again to me, and it is very sad that it 
involves Nobel Prize laureates attacking a fel- 
low laureate,’ he says. 

The last straw for Montagnier’s critics seems 
to have been his appearance in May alongside 
vaccine sceptics at a conference in Chicago, 
Illinois, organized by US patient-advocacy 
groups AutismOne and Generation Rescue. 
Montagnier’s talk, on his hypothesis that bac- 
terial infections may be one of many causes of 
autism spectrum disorder, states: “There isin > 
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> the blood of most autistic children — but not 
in healthy children — DNA sequences that emit, 
in certain conditions, electromagnetic waves.” 
Montagnier defends his research, point- 
ing out that some clinicians have observed 
improvements in symptoms of autism after 
long-term treatment with antibiotics. He says 
that he has never argued that vaccination could 
cause autism. “Many parents have observed a 
temporal association, which does not mean 
causation, between a vaccination and the 
appearance of autism symptoms,’ he says. “Pre- 
sumably vaccination, especially against multi- 
ple antigens, could be a trigger ofa pre-existing 
pathological situation in some children.” 


LEADERSHIP CRISIS 

The CIRCB, founded in 2006, is named after 
President Biya’s wife, who has championed 
efforts to fight AIDS in Africa. Montagnier’s 
AIDS foundation was a founding partner; 
Montagnier is also president of the now- 
defunct scientific advisory board, and vice- 
president of the management board. 

The current crisis compounds problems 
caused by the centre’s lack of stable full-time 
leadership. In March, its management commit- 
tee appointed Montagnier to replace former 
interim scientific director Vittorio Colizzi, an 
AIDS researcher on secondment from the Tor 


Vergata University in Rome, who had held the 
post since 2009. Colizzi was standing in until 
a full-time scientific director could be hired, 
but a recruitment process last year failed to set- 
tle on an agreed candidate. Some candidates 
had also expressed misgivings about the job, 
because at the time the scientific director and 
administrative director had to share power, a 
situation that caused tensions, says Colizzi. To 
address this issue, a presidential decree issued 
on 31 May merged the positions to create the 
post of permanent director, with full control 
of the centre. The move should make it much 
easier to attract a leading scientist to the post, 
says Jacques Theze, an immunologist at the 
Pasteur Institute in Paris, a former member of 
the CIRCB’s scientific board. 

The decree also required that many of the 
centre’s posts and committees be disbanded 
or renewed, creating an uncertain transitional 
period. On the day that Roberts resigned, for 
example, the scientific board was officially 
dissolved, and no clear timetable has been set 
to reestablish it. Colizzi is concerned that this 
deprives the centre of its main mechanism for 
enforcing rigorous peer-review and ethical 
oversight of research proposals. Montagnier 
says that he intends to continue all research pre- 
viously approved by the board, and that he will 
ask the next board to review the programme. He 


also plans to embark on new research, including 
a “key project” using his electromagnetic-wave 
theory to detect reservoirs of HIV in the body 
that persist after antiretroviral treatment. Any 
new projects, including his own, will need to 
be approved by the centre's science board and 
ethics committee, he says. 

Jean Stéphane Biatcha, head of the centre’s 
management board and a presidential adviser, 
recognizes the “very serious disagreement” 
but says that the president and the Ministry of 
Health will quickly enact the 31 May decree, 
and so will renew the scientific advisory board 
and begin the search for a permanent director. 

Theze says that he would have preferred 
Montagnier’s detractors to have taken a more 
diplomatic approach, and warns that the high- 
level criticism, and the resulting controversy, 
risks tarnishing the credibility and reputation 
of the centre, which he says is unfair, because 
the CIRCB has enormous potential. He wor- 
ries that the episode might also discourage 
scientists from applying for the position of 
director. m 


1. Montagnier, L., Aissa, J., Ferris, S., Montagnier, J.-L. 
& Lavalléee, C. Interdisciplin. Sci. 1, 81-90 (2009). 

2. Montagnier, L., et a/. Preprint at http://arxiv.org/ 
abs/1012.5166 (2010). 

3. Montagnier, L. et al. Interdisciplin. Sci. 1, 245-253 
(2009). 

4. Butler, D. Nature 468, 743 (2010). 


Britain aims for 
broad open access 


But critics claim plan seeks to protect publishers’ interests. 


BY RICHARD VAN NOORDEN 


towards open access for research, with 

some funding agencies requiring that 
researchers make their papers publicly avail- 
able within a set period after publication. A 
report commissioned by the UK government 
recommends a more radical step: making all 
papers open access from the start, with authors 
paying publishers up-front to make their work 
free to read. 

The shift towards this ‘gold’ form of open 
access will create short-term financial burdens 
for research funders, the report acknowledges, 
but the economic and cultural benefits far out- 
weigh the risks. Not everyone is convinced, 
however: research-intensive universities say 
they are concerned that the report plays down 
potentially cheaper ways to move to open access, 


he years, countries have been edging 
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in favour of sustaining publishers’ profits. 

“Momentum for open access is already 
under way, and it’s important for the United 
Kingdom to embrace that change, to accelerate 
it, and to manage it,’ says Janet Finch, a sociol- 
ogist at the University of Manchester, UK, who 
chaired the panel behind the report, which was 
released on 19 June. It is expected to set the 
national agenda for open access, and influence 
other countries to follow Britain’s lead. 

“The ultimate goal is to have a system where 
the full costs of research publication are met in 
advance,” says Martin Hall, another member of 
the panel and vice-chancellor of the University 
of Salford in Manchester. Globally, the number 
of gold articles is growing by about 30% each 
year, aided by the rise of journals such as PLoS 
ONE. But they still make up a minority of the 
world’s output — comprising about 12% of 
research articles indexed in Elsevier's Scopus 
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database in 2011, according to preliminary 
estimates by Mikael Laakso and Bo-Christer 
Bjérk at the Hanken School of Economics in 
Helsinki (see ‘Rise of gold’). UK researchers 
tend to publish in higher-impact selective jour- 
nals, so only 5% of their articles are gold open 
access, according to data collected by Yassine 
Gargouri, a informatician at the University of 
Quebec in Montreal, Canada (see ‘Open access 
in the UR). 

As that proportion rises, the report notes, 
authors’ open-access costs will grow — but 
university libraries will still have to subscribe 
to most of the journals that currently line their 
shelves. Subscription costs will fall substan- 
tially only when most research articles are 
freely available. During the transition period, 
gold and subscription models will exist side by 
side, potentially increasing the overall costs of 
access. The report also recommends subsidis- 
ing subscription licences for health and busi- 
ness users to give them better access. Overall, 
the panel estimates that these transitional costs 
will amount to roughly £50 million—60 million 
(US$78 million-94 million) per year, on top of 
the country’s existing annual spending of about 
£175 million to publish and access research. If 
the costs were to be met by research funders, 
they would total about 
1% of Britain’s annual 
science budget. 

The report does not 
recommend a figure for 


> NATURE.COM 
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the cost of a gold article, but notes that the UK 
Wellcome Trust, a major biomedical research 
funder, last year paid an average of £1,422 per 
paper on behalf of the scientists it supports. 
Costs could be greater in more selective jour- 
nals — Nature’s editor-in-chief Philip Camp- 
bell says that the journal would have to charge 
more than £6,500 for gold open-access articles. 

Universities and funders will have to work 
out how to transform their payment systems 
undera gold regime, with each institution likely 
to set up a central publishing fund supported 
by a percentage of every research grant. What- 
ever the solution, academics will be much more 
aware of the costs of publishing. This could, in 
turn, modify their behaviour, with research- 
ers submitting papers to the journals they can 
afford to publish in, or trying to publish fewer, 
broader articles. 


GOING GREEN 

An alternative open-access model is already 
thriving around the world, and particularly 
in the United Kingdom. Under green open 
access, research funders can require that peer- 
reviewed papers be made openly accessible in 
online repositories, without the author paying 
a fee. This usually happens some months after 
publication, a time period that allows publish- 
ers to sell access to the paper for long enough 
to turn a profit. Researchers can also post pre- 
publication versions of their papers in institu- 
tional repositories. 

Paul Ayris, director of library services at 
University College London, says that scaling 
up green publishing would bea cheaper short- 
term route to expanding open access, together 
with a nationwide scheme to pay for researchers 
access to subscription journals en masse. “The 
gold route does nothing about publisher profits, 
which many commentators feel are already too 
high,” he says. Open-access advocate Stevan 
Harnad, a cognitive scientist at the University 
of Montreal, is even more critical of the report’s 
overt support for gold access. “Some publishers 
seem to be successfully persuading some poli- 
ticians that what is at issue is protecting their 
current revenue streams and modus operandi 
from the threat of green open access,” he says. 

But the Finch group says that it was expressly 
asked to find sustainable ways to grow open 


RISE OF GOLD 


The world's gold open-access articles are 
rising as a share of the total. 
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Of the 85,215 research papers published by UK academics in 2010 (as indexed by Web of Science), around 5% 
were gold open access, whereby authors pay for open publication. Another 35% were green open access — 
published behind a pay wall and then put in a free repository. However, the proportion varied between disciplines. 
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access, which it says only a gold route can pro- 
vide. “Its not in the interests of UK scholarship 
to make recommendations which undermine 
the sustainability of the publishing indus- 
try,’ says Philip Sykes, another Finch group 
member and a librarian at the University of 
Liverpool. Universities can use their collec- 
tive lobbying power to drive down both sub- 
scription and gold costs, he adds. Gold open 
access will eventually result in lower incomes 
for publishers anyway, Finch members note, by 
making the research-publishing market more 
transparent and competitive. 

That’s particularly worrying for learned 
societies, because they rely on subscription 
publishing for much of their income. The 
London-based Institute of Physics, for exam- 
ple, earns some £10 million each year — more 
than 60% of its total income — from publish- 
ing, which it spends on activities such as sci- 
ence education and outreach, says its president 
Peter Knight. “The mood of the community is 
to get costs down — but if scientific publishing 
only just covered its costs, an awful lot of our 
programmes would be in jeopardy,’ he says. 

What matters now is how the agencies that 
support UK scientists require them to make 
their research freely available. Existing open- 
access mandates have been spottily enforced. 
The Wellcome Trust has only 55% compli- 
ance, although it will soon make grant funding 
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conditional on open-access publishing. A sim- 
ilar condition from the US National Institutes 
of Health currently has 75% compliance. 

In March, Research Councils UK (the 
umbrella body for the United Kingdom's seven 
government-funded grant agencies) released 
a draft policy that suggested it, too, would 
toughen up on open access. The Higher Edu- 
cation Funding Council for England, another 
major research funder, could go the same way. 
But the devil will be in the detail, says Hall. “If 
research funders go soft on open access, the 
Finch report will be of only academic interest” 

Most uncertain of all is how rapidly the 
United Kingdom's efforts might drive other 
countries towards open access. British scien- 
tists produce 6% of the research papers pub- 
lished worldwide each year, and the country 
could find itself paying to make its research 
free for others’ benefit. But there is growing 
momentum internationally. The European 
Commission hopes to push for an open-access 
mandate in its 2014-20 research-funding pro- 
gramme Horizon 2020, and the newly formed 
Global Research Council — a forum for fund- 
ing-agency heads worldwide (see Nature 485, 
427; 2012) — has open access on its agenda for 
its second meeting next year in Berlin. As the 
report concludes, “measures to promote open 
access need to be... international in scope if 
they are to achieve their full potential”. m 
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A workforce out of balance 


Too many biomedical PhDs and too few minorities are a demographic dilemma for the NIH. 


BY MEREDITH WADMAN 


he US biomedical workforce has a glut 

of young researchers but a dearth of 

some minority groups, members of 
which are struggling to establish themselves 
in the field. The double-barrelled problem is 
laid out in detail in a pair of reports presented 
to the US National Institutes of Health (NIH) 
in Bethesda, Maryland, on 14 June. 

NIH leaders have long worried about a steep 
increase in the number of biomedical PhDs, a 
consequence of the doubling of the NIH budget 
from 1998 to 2003. Now that boom is making 
it increasingly difficult for young scientists to 
launch academic careers (see ‘Swelling ranks’). 

“This is dysfunctional and it’s not sustainable 
in the long term,” says Shirley Tilghman, presi- 
dent of Princeton University in New Jersey and 
aco-chair of the working group that authored a 
report on structural problems in the workforce. 
It calls for several measures to address the over- 
supply of PhDs, including a six-year cap on the 
number of years that a graduate student can be 
supported by NIH funds and an increase in the 
proportion of students on career-oriented train- 
ing grants rather than on research grants. 

A second report, focusing on diversity, was 
spurred bya study published in Science last year, 
which found that after factors such as educa- 
tion and publication record are controlled for, 
black applicants are 10% less likely than white 
applicants to win NIH research funding (D. K. 
Ginther et al. Science 333, 1015-1019; 2011). 

The diversity report confirms that minority 
applicants have significantly reduced success 
rates for grant applications (see “Uneven play- 
ing field’). Confronted with data such as these, 
“there are a number of scientists of colour who 
feel, at the end of the day, ‘What's the point?” 
says Reed Tuckson, vice-president and chief 
of medical affairs at the health-insurance firm 
UnitedHealth Group in Minnetonka, Minne- 
sota, and one of three co-chairs of the diversity 
working group. That sentiment “really, really 
bothers me’, he adds. “We have got to turn 
[that] around” 

The report recommends, among other things, 
that the NIH launch a “bold”, well-funded com- 
petitive grant process to build infrastructure at 
institutions with a record of producing minor- 
ity scientists, and that it launch an experiment 
to make applicants’ identities and institutions 
anonymous in the review process. 

Francis Collins, director of the NIH, has 
promised to respond to the recommendations 
by December. m 
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SWELLING RANKS 


The number of US biomedical PhDs has ballooned in the past decade, driven by NIH funding of research 
assistantships. These typically offer less career development compared with traineeships and fellowships. 
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Researchers become faculty members and win a first NIH grant significantly later in life now than in 1980. 
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A growing number of US biomedical scientists are foreign workers. Early salaries are lower than in other fields. 
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people are under-represented in biomedical fields. 
Minority applicants have lower success rates than 
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Romanian prime minister 
accused of plagiarism 


Allegations prompt questions about government’s ability to tackle misconduct in academia. 


BY QUIRIN SCHIERMEIER 


omania’s new government, still reel- 
Re from a misconduct scandal that 

forced its research minister to resign 
last month, has been hit by fresh allegations of 
plagiarism that strike at the very top. 

Prime Minister Victor Ponta has been 
accused of copying large sections of his 2003 
PhD thesis in law from previous publications, 
without proper reference. If the charges are 
substantiated, they could spark public pressure 
for Ponta to resign, say political insiders. The 
allegations are also raising fresh doubts about 
the government's ability to tackle corruption 
in the higher-education system. 

Nature has seen documents compiled by 
an anonymous whistle-blower indicating that 
more than half of Ponta’s 432-page, Romanian- 
language thesis’ on the functioning of the 
International Criminal Court consists of 
duplicated text. Moreover, the thesis was 
republished with very minor amendments as a 
Romanian-language book in 2004 (ref. 2), and 
also forms the basis ofa 2010 book on liability 
in international humanitarian law’. A former 
PhD student of Ponta’s, Daniela Coman, is 
named as co-author of the books. 

Substantial sections of text in all three pub- 
lications seem to be identical, or almost so, to 
material in monographs written in Romanian 
by law scholars Dumitru Diaconu‘ and Vasile 
Cretu’. They also feature direct Romanian 
translations of parts of an English-language 
publication by law scholar Ion Diaconu’. 

“The evidence of plagiarism is overwhelm- 
ing,” says Marius Andruh, a chemist at the 
University of Bucharest and president of the 
Romanian council for the recognition of uni- 
versity diplomas. If the allegations are borne 
out, “a serious discussion is needed in Romania 
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and abroad to prevent this in the future,” says 
Andruh. 

“I understand that in law studies it can 
be necessary to copy extensive legal articles 
and definitions,’ says Paul Dragos Aligica, a 
Romanian political scientist at George Mason 
University in Arlington, Virginia. But Ponta’s 
alleged plagiarism “goes way beyond that. It’s 
astonishing’, he adds. Ponta did not respond 
to Nature’s request for comment on the allega- 
tions, and Coman could not be contacted. 

Ponta, leader of the Romanian Social Demo- 
cratic Party, took 
office as prime min- 


“ Ly 

ister only last month, Lied eee 
‘ : clear evidence in 

replacing Emil Boc, th : 
who stepped down rete - 
in February following fe lg er 
protests against aus- bd ou J e 
terity measures that Utves tigated 
he had introduced. further. 


Ponta obtained his 

PhD from the University of Bucharest while 
acting as Secretary of State in the government 
of an earlier prime minister, Adrian Nastase — 
who was also his PhD supervisor. 

“There is very clear evidence in these 
excerpts that the matter should be investigated 
further,’ says Vlad Perju, a Romanian political 
scientist and director of the Clough Center for 
the Study of Constitutional Democracy at Bos- 
ton College in Chestnut Hill, Massachusetts. 

The episode follows the resignation last 
month of the education and research minister, 
computer scientist loan Mang, following accu- 
sations of plagiarism in at least eight papers’. 
An investigation of that case by the Romanian 
Research Ethics Council is ongoing. 

The latest allegations add to complaints 
about declining academic standards in Roma- 
nia. The previous government had introduced 


measures to make the country’s struggling 
science and education system more com- 
petitive and transparent, but the plans met 
ferocious opposition from large parts of the 
academic establishment, and have been sub- 
stantially relaxed by the current government. 

“It's more than unlikely that this govern- 
ment is fit to create institutional structures 
in science that Romania urgently needs,’ says 
Aligica. “How can it be, when some of its lead- 
ers don't seem to even remotely understand, 
or care about, the standards of good science?” 

Members of Romania’s post-communist 
elite — including many politicians — have 
been eager to acquire academic credentials. In 
the view of some critics, a number of private 
and public universities in the country are con- 
sequently degenerating into ‘degree mills’ that 
care little about the quality or novelty of the 
knowledge that they produce, and which are 
a breeding ground for academic plagiarism. 

“One could almost feel pity for all these guys 
who have power and money, and who are now 
craving intellectual recognition,” says Aligica. 
“Unfortunately these incidents just add to the 
disrepute of Romanian academic standards 
and create extra pressure that real Romanian 
scholars and scientists will now have to fight 
against. 
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Many unregulated gold mines in Madre de Dios, Peru, leave a trail of environmental devastation in their wake. 


ENVIRONMENT 


Peru battles the golden 
curse of Madre de Dios 


Attempts to reduce the environmental and health impacts of mining cause unrest. 


BY ELIE GARDNER IN LIMA 


cc ; ’ ou go out to these communities that 

are so incredibly poor, and there is 

money buried in the dirt,’ says Jason 

Scullion. “It is not surprising that they want to 
go out there and dig it up.” 

Scullion, a graduate student in environ- 
mental and forest sciences at the University 
of Washington, Seattle, arrived in the Peru- 
vian region of Madre de Dios last Septem- 
ber, just after the price of gold hit a record 
high of US$68 per gram. Madre de Dios is at 
ground zero of Peru’s gold rush: an estimated 
30,000 artisanal and small-scale miners work 
in this lush Amazonian area, one of the most 
biologically diverse places on Earth. Scien- 
tists and conservationists are alarmed by the 
damage that mining is causing to the land and 
its people. 
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In the 1930s, settlers armed with shovels, 
picks and pans came to search the riverbanks 
for deposits of gold washed down from the 
Andes mountains. Now, mechanical diggers 
and dump trucks are much more common. 
Peru is the sixth-largest producer of gold in the 
world, and the metal was its main export in the 
first quarter of this year. Yet about 20% of Peru's 
bullion is mined illegally, using techniques that 
destroy forests and pollute local rivers — as well 
as depriving the government of an estimated 
US$305 million in taxes each year. The infor- 
mal operations rarely assess their effects on the 
environment, or develop plans for what to do 
with the mines once they are exhausted, and 
they leave behind mountains of sand and rock, 
dead trees and deep pits filled with murky water. 

Now the government is tightening the screws 
on illegal mining, and scientists monitoring 
its impacts are on the front lines of a battle 
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between miners, environmental campaigners 
and the authorities. 

In February, the Peruvian government 
banned mining in Madre de Dios outside a des- 
ignated 500,000-hectare corridor (see map) and 
ordered that all miners must formally register 
—a year-long process that requires the mine 
operators to produce a work plan, an envi- 
ronmental-impact assessment and a clean-up 
strategy, among other requirements. By setting 
aside a specific area for mining, the government 
hopes to regulate the industry more effectively 
and to protect parks and the territories of indig- 
enous people. But miners who have worked 

outside this newly des- 
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Mining Federation in regional capital Puerto 
Maldonado has campaigned strongly against 
the legislation. In March, three people died as 
police clashed with federation-organized pro- 
tests involving about 15,000 people. The gov- 
ernment began raiding mining camps outside 
the corridor in late March, but in many of those 
areas miners have since returned to work. 

Only around 4,000 miners in the region 
have met the government’s deadline of 13 June 
to register their mining activities, suggesting 
that many more intend to disregard the order. 
Miners who have not registered, or who break 
environmental laws, will face up to 10 years in 
prison. But locals say that corruption and lack 
of government resources in Madre de Dios will 
make enforcing the law difficult. 


SPOILT WILDERNESS 

At stake are broad swathes of fertile rainfor- 
est, including that in the Tambopata National 
Reserve and Manu National Park, which is the 
largest national park in Peru and has been des- 
ignated a World Heritage Site by the United 
Nations Educational, Scientific and Cultural 
Organization. “This is the epitome ofa healthy 
ecosystem,” says Enrique Ortiz, vice-president 
of the Amazon Conservation Association, 
based in Washington DC. “It’s the capital of 
biodiversity.’ But mining is already beginning 
to encroach on these areas, and was threaten- 
ing to become more widespread. A 
study published last year (J. J. Swen- 
son et al. PLoS ONE 6, e18875; 2011) 
showed that mining is deforesting 
Madre de Dios faster than any other 
activity. Using satellite imagery, the 
study’s authors found that deforesta- 
tion in two prominent mining zones 
increased sixfold between 2003 and 
2009, destroying 6,600 hectares of 
wetlands and primary tropical forest. 
And they predicted that the trend will 
only get worse. 

Scullion came to Madre de Dios 
to find out whether that prediction is 
coming true. Funded by a 10-month 
grant from the US government’s Ful- 
bright Program, he is mapping 2 mil- 
lion hectares of the region, including 
many of the mining hotspots. Most pre- 
vious mapping studies have assigned 
land to only two categories — forested 
and deforested — lumping agriculture, towns 
and mining areas together, and making it dif- 
ficult to track the impact of the gold rush. Scul- 
lion’s study will use ten categories, including 
mining, agriculture and five different classes of 
forestry. 

Scullion says that there is a misconception 
among locals that researchers are against min- 
ing. Not so, he says — he just wants it to be 
done in a more sustainable way, staying out of 
parks and reserves, and ensuring that miners 
reforest areas after operations have finished. 
Scullion hopes that his maps will identify the 


most vulnerable areas in Madre de Dios, as 
well as the most biologically diverse, and says 
that they could provide the high-quality infor- 
mation that the government needs to decide 
whether land should be used for logging, min- 
ing or conservation. The data could also help 
to guide where — and how much — mining 
takes place in the designated corridor. 

Mining is also taking its toll on local people. 
An estimated 45-50 tonnes of mercury are 
used each year in Madre de Dios to extract the 
prized gold, and a large proportion of that ends 
up in rivers or is released into the atmosphere. 

Miners combine mercury with sediments 
that contain gold — typically using their feet 
to mix them ina bucket or drum — to forma 
solid amalgam of the two metals. That amal- 
gam is then heated, often in frying pans over 
open flames in non-ventilated spaces, to boil 
off the mercury and leave gold behind. 

In March, Katy Ashe, a graduate student in 
environmental engineering at Stanford Uni- 
versity in California, published the first study 
(K. Ashe PLoS ONE 7, e33305; 2012) to show 
the scale of the health threat from mercury 
in Madre de Dios. She found that in mining 
zones, the proportion of people burdened with 
unhealthy levels of the metal — 6 micrograms 
or more per gram of dry hair tested — was 
more than twice that in Puerto Maldonado. 
Mercury poisoning can cause vomiting and 
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diarrhoea and, in more extreme cases, brain 
or kidney damage. Because the metal accu- 
mulates in rivers, elevated mercury levels 
were much more common in those who ate a 
lot of fish: 18% of people who ate 12 or more 
fish meals each month had unhealthy mercury 
levels, in contrast to just 6% and 7% of low and 
moderate fish consumers, respectively. 

That finding tallies with as-yet unpublished 
research by Luis Fernandez, a tropical ecologist 
at the Carnegie Institution for Science in Stan- 
ford, with whom Ashe is about to begin work- 
ing. In 2009, Fernandez discovered that the 
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most-consumed fish species in Madre de Dios, 
such as the mota (Calophysus macropterus) 
and doncella (Pseudoplatystoma fasciatum), 
had the highest levels of mercury. Fernandez is 
now leading a project to conduct a more exten- 
sive survey of the levels of mercury in fish and 
humans. 

Despite the overwhelming evidence of 
harm, Peruvians are still divided over the 
findings. Some march through Puerto Mal- 
donado’s main plaza shouting through mega- 
phones that mercury 


“Mining is is killing everyone, 
deforesting whereas others are 
Madre de Dios willing to drink the 
faster than any toxic liquid metal to 
other activity.” _ prove itis safe. 


Variations of the 
mercury-amalgamation technique have been 
used in gold mining for centuries, and it is dif- 
ficult to dislodge such deep-rooted practices 
among artisanal miners. The global gold- 
mining industry, including leading mines in 
Peru, has mostly switched to an extraction 
process that uses cyanide, and recovers about 
twice as much gold as does mercury amalga- 
mation. But cyanide requires more careful 
handling than mercury and few artisanal or 
small-scale miners have the necessary knowl- 
edge and skills. The more modern process of 
thiosulphate leaching might offer a non-toxic 
alternative, but it is most effective 
with very fine particles of gold — and 
Madre de Dios tends to yield larger, 
coarser grains. 

To reduce miners exposure to mer- 
cury, non-governmental organizations 
have distributed retorts that can cap- 
ture the toxic vapour. Two years ago, 
Peruvian engineer Carlos Villachica 
unveiled the ECO-100V, a US$4,500 
machine that uses water and jets of air 
to separate gold from sediments. And 
local development organizations such 
as Caritas Peru, based in Callao, and 
the Association for Integral Research 
and Development in Lima have devel- 
oped other mercury-free technologies 
for extracting gold. 

But Cesar Ascorra, director of Cari- 
tas Peru’s office in Madre de Dios, says 
that miners will not switch methods 
unless the alternative works just as 
quickly, recovers at least as much gold, and is no 
more expensive than mercury amalgamation. 
For now, he adds, miners are more worried 
about the government's demands for them to 
formalize their work, and until the price of mer- 
cury goes up, or its use is banned, there is little 
incentive for them to change their practices. 

Ortiz says that this underscores the value of 
research in the region — and the importance 
of disseminating the results through public- 
awareness campaigns. Studies such as Fernan- 
dez’s are not done “for the sake of knowledge’, 
says Ortiz. “This has a direction.” = 
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Unlike the Acropolis, many classical sites are under threat from looting. 


ARCHAEOLOGY 


Cuts leave Greek 
heritage in ruins 


Austerity measures damaging archaeological research. 


BY LEIGH PHILLIPS 


r The economic and political turmoil in 
Greece is not just jeopardizing the coun- 
try’s economic future, it is also having a 
devastating effect on the country’s rich cultural 
past, according to archaeologists in Athens. 
Last month, the Association of Greek 
Archaeologists warned that the economic poli- 
cies dictated by the European Union and the 
International Monetary Fund would cause “the 
destruction of both our country and our cul- 
tural heritage”. The austerity measures intended 
to cut government debt have forced the state 
archaeological service to slash staff numbers by 
more than 10%, with a further 35-50% reduc- 
tion possible. Research and excavations are 
being abandoned. Museums that can no longer 
afford to pay for security are being plagued by 
armed robbers. And organized criminals are 
exploiting the chaos in an explosion of illegal 
digs and the trafficking of illicitly procured 
antiquities. 
Despina Koutsoumba, president of the 
archaeologists’ association, says that the 
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government no longer funds new research pro- 
jects — other than those involving foreign part- 
ners to whom they are contractually obliged. 
“Last year, a very important excavation 
underneath metro rail lines uncovered long 
walls connecting Athens to the port of Piraeus, 
built in the age of Pericles,” she says. “The walls 
to the port were vital and played a major role in 
the Peloponnesian War as they linked Athens 
to the sea, and their strength was in their fleet.” 
But the excavation was stopped uncompleted, 
she says, and there will be no opportunity to 
return to it once the rail lines are finished. 
Koutsoumba says that the ministry also 
wanted to halt work at a site in Thessaloniki, 
after major buildings from the time of Roman 
emperor Gaius Galerius around ap 305-11 
were discovered on land where a shopping cen- 
tre was to be built. The archaeologists’ associa- 
tion won a small victory 
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archaeologists in the service have been cut by 
35%, to €670 (US$842) a month, and many 
have taken part-time jobs to supplement their 
income. Meanwhile, senior employees with the 
highest salaries and most experience are being 
forced into early retirement. Those who remain 
are reduced to bureaucratic ‘box-ticking’ — vis- 
iting construction sites to assess whether exca- 
vation is needed. They no longer have time for 
research or analysis, says Koutsoumba. 

“Excavation is of course part of our job, but 
we also need time and funds to do research, to 
publish our findings. We are not giving anything 
to the scientific community and letting people 
know about our new discoveries — we are just 
digging up pretty objects, she says. 

While legitimate archaeology is being ham- 
pered, looting is on the rise. The country is 
pockmarked with holes dug by the poor and 
desperate hoping for ‘buried treasure’, and 
organized criminals perform more profes- 
sional excavations. “There is no doubt that 
there has been an increase in the past 3—-4 years 
in both organized and amateur illegal digs and 
this is definitely related to the cuts,’ says Chris- 
tos Tsirogiannis, a forensic archaeologist and 
researcher at the University of Cambridge, UK, 
who specializes in investigating the criminal 
networks behind trafficking in antiquities. He 
escorts the Greek police art squad on raids to 
identify looted antiquities. 

“There is always a rise in this form of crime 
in regions rich with antiquities during times 
of crisis — in recent years in Egypt, Iraq and 
Afghanistan,’ says Tsirogiannis. 

He says that auction houses and museums 
are not doing enough to avoid dealing in arte- 
facts of “unprovenanced origin’ “I wouldn't 
be surprised if in the coming years there is a 
large increase in the number of unprovenanced 
Greek antiquities sold openly,’ he says. 

Christie’s, the London auction house, has 
strict internal policies to ensure they only offer 
objects for sale legally, says Matthew Patton, 
head of communications. 

The European Commission denies that 
spending cuts are to blame. “Greece has 
received very large sums that go towards cul- 
tural heritage. Many cultural institutions have 
been saved in Greece because of the work of 
the European Union,’ says Dennis Abbott, 
spokesman for Androulla Vassiliou, European 
Commissioner for culture and education. “But 
there are limits to what we can do. The main 
responsibility lies with the states,” he says. m 


CORRECTION 

The News story ‘Journal offers flat fee for 
“all you can publish”’ (Nature 486, 166; 
2012) stated that all co-authors on a paper 
must be members of PeerJ. In fact, only 12 
co-authors need to be paying members. 

It also wrongly noted that mBio does not 
assess for impact or importance: it does. 
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AN ILL WIND 


WITH TURBINES THREATENING SOME BIRD AND BAT POPULATIONS, 
RESEARCHERS ARE SEEKING WAYS TO KEEP THE SKIES SAFE FOR WILDLIFE. 


as he walked among the limestone hills at the 

southern tip of Spain. It was October 2008, 
and thousands of griffon vultures — along with other vulnerable raptors 
— were winging towards the Strait of Gibraltar and beyond to Africa. 
But first they had to navigate some treacherous airspace. The landscape 
on either side of the strait bristles with wind turbines up to 170 metres 
high, armed with blades that slice the air at 270 kilometres per hour. 

Bechard, a biologist at Boise State University in Idaho, and colleagues 
from the Dofana Biological Station in Seville, Spain, had been hired to 
help the birds make it safely past 13 wind farms in Cadiz province. Each 
time the researchers spotted a raptor heading towards a turbine, they 
called the wind farm’s control tower. Within minutes the blades slowed 
to a stop, and one more migrating bird soared past unharmed. Then the 
turbine swung back into action. 

When the biologists weren't looking up at the sky, they were scouring 
the ground for carcasses of griffon vultures (Gyps fulvus), Spanish impe- 
rial eagles (Aquila adalberti) and other species. The Spanish Ornithologi- 
cal Society in Madrid estimates that Spain’s 18,000 wind turbines may be 
killing 6 million to 18 million birds and bats annually. “A blade will cut 
a griffon vulture in half? says Bechard. “I’ve seen them just decapitated.” 

Wind turbines kill far fewer birds in general each year than do many 
other causes linked to humans, including domestic cats and collisions with 
glass windows. But wind power has a disproportionate effect on certain 
species that are already struggling for survival, such as the precarious US 
population of golden eagles (Aquila chrysaetos canadensis). 

“The troubling issue with wind development is that we're seeing a 
growing number of birds of conservation concern being killed by wind 


Mi arc Bechard turned a worried eye skywards 
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turbines,” says Albert Manville, a biologist with the 
US Fish and Wildlife Service in Arlington, Virginia. 

The deaths caused by turbines have the potential 
to harm not only wildlife, but also the wind-energy industry, which is 
the fastest-growing source of power worldwide, according to the World 
Bank. With critics vilifying wind turbines as ‘bird blenders, wind compa- 
nies, governments and researchers are teaming up to mitigate the prob- 
lem before it reaches a crisis point. Cadiz province, for example, requires 
all wind-energy projects to consider environmental issues, and helps to 
fund research on reducing any damage. 

The early signs are that with targeted efforts, wind power and wildlife 
can cautiously coexist. Bechard and his colleagues, for example, lowered 
mortality at the Cadiz wind farms by 50%, with only a 0.07% loss in 
energy production’. Others are finding that minor changes in the design 
or operation of wind farms can bring major reductions in animal deaths. 

Bechard and others believe research is crucial to making wind energy 
viable: “In the long run it'll save a lot of money and a lot of headaches” 


THE GATHERING STORM 
Wind power is poised to take off as the world seeks new sources of 
renewable energy. The industry is growing most quickly in China, 
which plans a 60% increase in wind power in the next three years. The 
US Department of Energy is aiming for a sixfold jump by 2030, and 
the European Union is working towards supplying 20% of its energy 
demand using renewable sources by 2020, much of that from wind. 
But the rapid expansion of wind power can harm wildlife in multiple 
ways. Beyond direct collisions with turbines, wind farms threaten species 
by displacing habitat. And bats can develop fatal internal haemorrhaging 
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SOURCE: A. MANVILLE, US FISH AND WILDLIFE SERVICE 


Raptors favour sites 
ideal for turbines, like 
these in Cadiz, Spain. 


as a result of air-pressure changes when they fly 
through the wake ofa spinning blade. 

The industry maintains that the effects on 
wildlife are minor. Although there are only a 
few, limited estimates of bird fatalities at a national level, the available 
data for the United States suggest that wind farms account for a tiny 
fraction of avian deaths (see ‘Bird killers’). 

But the concern is that turbines threaten species that are already 
struggling, such as bats, which in North America have been hit hard by 
white-nose fungus. Another vulnerable group is raptors, which are slow 
to reproduce and favour the wind corridors that energy companies covet. 
“There are species of birds that are getting killed by wind turbines that do 
not get killed by autos, windows or buildings,’ says Shawn Smallwood, 
an ecologist who has worked extensively in Altamont Pass, California, 
notorious for its expansive wind farms and raptor deaths. Smallwood has 
found that Altamont blades slay an average of 65 golden eagles a year’. 
“We could lose eagles in this country if we keep on doing this,” he says. 

Other species at risk include the critically endangered California con- 
dors (Gymnogyps californicus) — which number only 226 in the wild 
— and the few hundred remaining whooping cranes (Grus americanus), 
concentrated in the central United States. Biologists can’t say whether 
the increase in wind farms will cause the collapse of these or other bird 
species, which already face many threats. But waiting for an answer 
is not an option, says Smallwood. “By the time we do understand the 
population-level impacts, we might be in a place we dort want to be.” 


DAMAGE CONTROL 
In Cadiz, temporarily shutting down turbines has worked because the 
biggest threat is to migratory birds, which pass through only occasion- 
ally. Similar methods could reduce mortalities along the migratory 
bottlenecks in Central America, Europe and Asia, says Miguel Ferrer, 
a conservation biologist at Dofiana and a co-author of the Cadiz study. 

But that tactic will not work in Altamont Pass, which has both migra- 
tory and permanent avian populations. Instead, companies there are 
making headway by replacing small, ageing turbines with fewer large 
ones. Choosing sites carefully can help, too. “Raptors do not use the 
landscape randomly,’ explains Doug Bell, wildlife programme manager 
with the East Bay Regional Park District, which manages parklands and 
monitors wind farms around Altamont. 

When the Buena Vista Wind Energy Project at Altamont replaced 
179 turbines with 38 taller ones 
in 2006, Smallwood advised the 
company to avoid ridge saddles 
between hills and other hotspots 
for raptor traffic. Since then, golden 
eagle fatalities at Buena Vista have 
dropped by 50% and other raptor 
deaths by 75%, says Smallwood. 

In the eastern United States, Todd 
Katzner, a biologist at West Vir- 
ginia University in Morgantown, 
finds troublesome locations using 
a tracker designed to fit on golden 
eagles. “We can identify places where 
there are wins for both sides; he says. 
Moving a turbine site by a few hun- 
dred metres can substantially reduce 
the risk of collisions, says Katzner. 

Companies say that they have 
learned from past mistakes. “That 
there hasn't been another [location 
like] Altamont Pass for years should 
be an assurance that the wind 
industry has gotten better at siting,” 
says Stu Webster, director of per- 
mitting and environmental affairs 


BIRD KILLERS 


Communication towers: 
5 million to 6.8 million 


Wind turbines: 
100,000 to 
about 440,000 


Wind farms kill fewer birds than most other hazards, but pose a 
particular threat to species including raptors. Estimates of annual 
fatalities — shown for the United States — are highly uncertain. 


Automobiles: 
60 million to 80 million 


Power line electrocutions and collisions: 
hundreds of thousands to 175 million 
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at Iberdrola Renewables, a renewable-energy firm in Portland, Oregon. 

Sometimes a slight change in procedures can make a big difference. 
For example, most turbines are set to turn on when wind speeds reach 
4.0 metres per second. But when the Iberdrola Renewables Casselman 
Wind Project in Pennsylvania increased the threshold to 5.5 metres per 
second, it slashed deaths of bats — which dont fly as much in high 
winds — by 93% while shaving just 1% off of power production, says Ed 
Arnett, who conducted a study there’ while working at Bat Conserva- 
tion International in Austin, Texas. 

Some wind farms are betting on technology to make a difference. The 
MERLIN radar system, made by DeTect in Panama City, Florida, scans 
the skies for up to 6.5 kilometres around and uses algorithms to detect 
incoming flocks and even individual birds and bats, says Gary Andrews, 
chief executive of DeTect. Iberdrola Renewables and Pattern Energy, a 
wind-power company based in San Francisco, California, are using the 
system at wind farms, as is Torsa Renewables, based in Malaga, Spain. 

Some researchers question the effectiveness of the radar. In 2010, a US 
Fish and Wildlife Service biologist saw a blade hit an American white 
pelican (Pelecanus erythrorhynchos) at Iberdrola’s Pefiascal wind farm in 
Texas, which was using MERLIN. But Iberdrola says that the system there 
is set up to prevent mass mortalities, not to detect individual birds. 

According to Iberdrola, birds are most at risk in fog or other periods 
of low visibility, so Peftascal sometimes shuts down turbines during bad 
spells, unless the radar reveals that there are no birds around. Generally, 
however, the radar indicates that birds avoid the turbines by themselves. 
“We have terabytes’ worth of data that are showing us how birds actually 
react to wind farms,’ says Webster. 

Researchers would love to see such data, but so far they have not been 
given access. “I haven't seen any results in the published literature,’ says 
Arnett, “so in my mind, [the radar] remains untested” 

Companies worry that any monitoring data that they release could 
be used against them in lawsuits by environmental organizations or 
in political attacks from groups that support the fossil-fuel industries 
and want to scuttle wind power. The American Wind Wildlife Insti- 
tute, a coalition of industry and conservation organizations based in 
Washington DC, is attempting to remedy the situation by creating a 
limited-access data repository that protects company privacy. The insti- 
tute expects to complete the pilot phase of the project by mid-summer. 

Even as they start to open up, some sectors of the wind industry 
acknowledge that they should do more to head off problems. “We as an 
industry need to do a better job of 
incorporating mitigation strate- 
gies into our economics,” says John 
Calaway, director of wind develop- 
ment at Pattern Energy. 

The strategies seem to be working 
in Cadiz, but Bechard still frets when 
he sees vultures on the horizon. “It’s 
always nerve-wracking” wondering 
glass and lighted buildings whether the turbines will stop in 
100,000 to 1 billion+ time, he says. When the birds pass 

pee safely, he breathes a sigh of relief. But 
- with wind farms popping up around 
the globe, Bechard worries about 
what the vultures will encounter as 
they disappear into the distance. m 


Buildings: Strikes to building 


Cats (domestic 
_ and feral): 


365 million 
* tol billion 


Meera Subramanian is a freelance 
writer in Cape Cod, Massachusetts. 


ime 1. de Lucas, M., Ferrer, M., Bechard, 
echddes: M. J. & Mufioz, A. R. Biol. Conserv. 147, 

peo Oe 184-189 (2012). 

67 million 2. Smallwood, K. S. & Karas, B. 

to 90 million J. Wildlife Manage. 73, 1062-1071 (2009). 

3. Arnett, E. B., Huso, M. M. P. 

Shirmacher, M. R. & Hayes, S. P. Front. 

Ecol. Environ. 9, 209-214 (2011). 


21 JUNE 2012 | VOL 486 | NATURE | 311 


© 2012 Macmillan Publishers Limited. All rights reserved 


| NEWS | FEATURE 


ABROKEN CONTRACT 


ate in May, the direct-to-consumer 
gene-testing company 23andMe proudly 
announced the impending award of 
its first patent. The firm’s research on 
Parkinson's disease, which used data from sev- 
eral thousand customers, had led to a patent on 
gene sequences that contribute to risk for the 
disease and might be used to predict its course. 
Anne Wojcicki, co-founder of the company, 
which is based in Mountain View, California, 
wrote in a blog post that the patent would help 
to move the work “from the realm of academic 
publishing to the world of impacting lives by 
preventing, treating or curing disease”. 

Some customers were less than enthusiastic. 
Holly Dunsworth, for example, posted a com- 
ment two days later, asking: “When we agreed 
to the terms of service and then when some of 
us consented to participate in research, were 
we consenting to that research being used to 
patent genes? What's the language that covers 
that use of our data? I can’t find it” 

The language is there, in both places. To be 
fair, the terms of service is a bear of a docu- 
ment — the kind one might quickly click past 
while installing software. But the consent form 
is compact and carefully worded, and approved 
by an independent review board to lay out 
clearly the risks and benefits of participating 
in research. “If 23andMe develops intellectual 
property and/or commercializes products or 
services, directly or indirectly, based on the 
results of this study, you will not receive any 
compensation,’ the document reads. 

The example points to a broad problem in 
research on humans — that informed consent 
is often not very well informed (see ‘Reading 
between the lines’). Protections for participants 
have been cobbled together in the wake of past 
controversies and have always been difficult to 
uphold. But they are proving even more prob- 
lematic in the ‘big data era, in which biomedi- 
cal scientists are gathering more information 
about more individuals than ever before. Many 
studies now include the collection of genetic 
data, and researchers can interrogate those 
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AS RESEARCHERS FIND MORE USES 
FOR DATA, INFORMED CONSENT HAS 
BECOME A SOURCE OF CONFUSION. 
SOMETHING HAS TO CHANGE. 


data in a growing number of ways. Several 
US states, including California, are consider- 
ing laws that would curtail the way in which 
researchers, law-enforcement officials and pri- 
vate companies can use a person's DNA. 

The research coordinators who develop 
consent forms cannot predict how such data 
might be used in the future, nor can they 
guarantee that the data will remain protected. 
Many people argue that participants should 
have more control over how their data are 
used, and efforts are afoot to give them that 
control. Researchers, meanwhile, often bristle 
at the added layers of bureaucracy wrought by 
the protections, which sometimes provide no 
real benefits to the participants. The result is 
a mess of opinions and procedures that sow 
confusion and risk deterring people from par- 
ticipating in research. 

“A lot of times researchers will say, “Why 
cant we just go back to the way it was?) which 
was basically that we take these samples and 
people do it for altruistic reasons and every- 
thing’s lovely,’ says Sharon Terry, president of 
the patient-advocacy group Genetic Alliance 
in Washington DC. “That worked in a prior 
age. I don’t think it works today.” 

The concept of informed consent was 
first set out in the Nuremberg Code, a set of 
research-ethics principles adopted in the 
wake of revelations of torture by Nazi doctors 
during the Second World War. But in recent 
years, a series of mishaps over consent have 
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undermined support for research. In 2004, 
for example, scandal erupted in the United 
Kingdom after parents found out that from 
the late 1980s to the mid-1990s doctors and 
researchers had removed and stored organs 
and tissues from patients — including infants 
and children — without parental consent. New 
laws were passed that required explicit consent 
for such collections. 

Then, in 2010, the Havasupai tribe of Ari- 
zona won a US$700,000 settlement against 
Arizona State University in Phoenix. Indi- 
viduals believed that they had provided blood 
for a study on the tribe's high rate of diabetes, 
but the samples had also been used in mental- 
illness research and population-genetics stud- 
ies that called into question the tribe's beliefs 
about its origins. In the settlement, the uni- 
versity’s board of regents said that it wanted to 
“remedy the wrong that was done”. 

The cases illustrate the divide between 
researchers and the public over what people 
need to know before agreeing to participate in 
research. 

Many of the recent concerns over consent 
are driven by the rapid growth of genome 
analysis. Decades ago, researchers werent 
able to glean much information from stored 
tissue; now, they can identify the donor, as well 
as his or her susceptibilities to many diseases. 
Researchers try to protect the genetic data 
through technological and legal mechanisms, 
but both approaches have weaknesses. 
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It is not enough to stripping out any 
information that would identify the donor, 
such as names and full health records, before 
the data are stored. In 2008, geneticists showed 
that they could easily identify individuals 
within pooled, anonymized data sets if they had 
a small amount of identified genetic informa- 
tion for reference (N. Homer et al. PLoS Genet 
4, e1000167; 2008). And it may become pos- 
sible to identify a person in a public database 
from other information collected during a 
study, such as data on ethnic background, loca- 
tion and medical factors unique to the study 
participants, or to predict a person’s appearance 
from his or her DNA. 

Even legal mechanisms have vulnerabilities. 
In 2004, Jane Costello, a social psychologist at 
Duke University in Durham, North Carolina, 
was forced to go to court to defend the con- 
fidentiality of patient records from the Great 
Smoky Mountains Study. The study, which is 
just going into its third decade, examines emo- 
tional and behavioural problems in a cohort 
of people who enrolled as adolescents. A par- 
ticipant in the study was testifying against her 
grandfather, John Trosper JT” Bradley, who 
had been accused of sexual abuse. JT’s lawyers 
subpoenaed the granddaughter’s records from 
the study in hope that the information would 
undermine her credibility as a witness. 

It meant a major crisis of confidence for 
Costello. “I was telling 1,400-plus people every 
time we saw them that ‘your data are absolutely 
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safe, and now I was in a position where I was 
told, “No, that’s not true,” she says. After 
Costello’s day in court, in August 2004, the 
records remained sealed, but mostly because 
the judge did not believe that they would exon- 
erate JT. The result provided no clarity about 


patient protections. 


BETTER MODELS FOR CONSENT 

One solution is to keep genetic information 
separate from demographic data. The BioVU 
databank at Vanderbilt University Medical 
Center in Nashville, Tennessee, for instance, 
contains DNA samples from patients treated 
at the hospital — 143,939 people as of 11 June. 
The DNA is linked to health records in a sec- 
ond database, called a ‘synthetic derivative; in 
which the data are anonymized and scrambled 
in ways that, its creators say, make it difficult 
for anyone to work back from the database to 
verify a patient’s identity. Sample-collection 
dates are altered, for example, and some 
records are discarded at random, so that it is 
not possible to know that someone is in the 
database just because he or she was treated at 
the hospital. Even researchers who work with 
the data cannot determine whose data they 
are using. The databank expects to include as 
many as 200,000 individuals by 2014, making 
it one of the largest collections of linked genetic 
and health records in the world. 

But when it comes to consent, BioVU takes 
a different approach from many other 
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programmes. Patients don’t choose to 
participate; rather they are given the chance to 
opt out. Patients are asked to sign a ‘consent to 
treat’ form every year. It includes a box that they 
can tick to keep their DNA out of the database. 
That model helps BioVU to collect many more 
samples, and much more cheaply, than other 
projects can. 

The opt-out model — which is used in only a 
few other places — troubles Misha Angrist, a 
genome policy analyst at Duke University, who 
says that it risks taking advantage of people 
when they are ill. “Even a routine visit to the 
clinic can be a vulnerable moment, and they're 
saying, ‘Would you mind doing this for future 
generations, to help people just like you?” 

And legal challenges have shown the weak- 
nesses of opt-out policies. Health officials are 
now destroying millions of blood samples 
taken from newborn babies in Texas and 
Minnesota because the families were not ade- 
quately informed that the samples, collected to 
screen for specific inherited disorders, would 
also be used in research. 

Vanderbilt officials and researchers counter 
that they have run extensive public campaigns 
to ensure that people in Nashville are aware 
of BioVU and are comfortable with the way 
it works. They regularly consult a commu- 
nity advisory board about the project. And 
Vanderbilt’s approach actually goes above and 
beyond what is required by federal law; because 
the synthetic derivative includes de-identified 
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READING BETWEEN THE LINES 


confused by the wording. 


Despite the work that goes into making consent forms clear and detailed, some participants say they are 


5, What are the benefits and risks of participating? 


Ean If 23andMe developsGntellectual propertysand/or commercializes products or services, directly or indirectly, ~~ 


based on the results of this study, you will not receive any compensation. 


_23andMe received a patent in May for its work on Parkinson's disease. 
Some participants did not expect it to seek intellectual-property rights. 


_ From 23andMe'’s research consent form www.23andme.com/about/consent/ 


2. | have been informed that the purpose of the research is to study the causes of behavioral/medical disorde 


Many participants in the Medical Genetics at Havasupai study in Arizona in the 1990s were 
unaware of how broad the research goals were and what kind of studies would be performed. 


Courtesy of Pilar Ossorio 


Re-identification 


We are quickly learning that with powerful computers and good mathematicians, it is increasingly 
- possible to uniquely identify people inside large data sets ... Researchers will sign a contract in which 


<evennif they were able to identify you, they won'tdoit> 


they agree that 


: | Inan attempt to be more transparent about privacy risks, the Consent to Research project’s ‘Portable Legal _ 


Consent’ essentially states that although anonymity cannot be guaranteed, researchers must pledge to uphold it. 


Courtesy of John Wilbanks Consent to Research 
| 


data, it doesn't legally require informed consent 
at all. Last July, the US Department of Health 
and Human Services signalled that it might be 
rethinking the rules that exempt de-identified 
data from the consent requirement, as part ofa 
broad overhaul of research ethics regulations. 
Irrespective of the outcome, obliterating 
patient identities has drawbacks. Researchers 
can't perform some types of research on the 
scrambled data. Because dates are changed, 
studies on the timing of influenza infections, 
for example, are impossible. And patients can't 
be told if the research has revealed that they 
carry individual genetic risks linked to disease. 


FULL DISCLOSURE 

Returning study results to research participants 
has been another thorny issue for consent. Doc- 
tors might learn about genetic predispositions 
to disease that are separate from the ailments 
that led a patient to participate in the research 
in the first place, but it is not clear what they 
should do with this information. 

UK researchers, for example, are forbidden 
from sharing genetic results with participants. 
But US research societies, such as the Ameri- 
can College of Medical Genetics and Genom- 
ics in Bethesda, Maryland, are moving towards 
adopting standards that would encourage the 
practice for some types of findings, such as 
those that are medically relevant. 

Some countries, such as Germany, Austria, 
Switzerland and Spain, are already feeding 
back such information. And some clinical 
sequencing programmes are considering offer- 
ing patients ‘tiered’ consent, in which people 
can decide whether to be told about their data 
and how much they want to learn. 

This is what Han Brunner, a geneticist at 
the Radboud University Nijmegen Medical 
Centre in the Netherlands, had hoped to do. 
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Last year, he began a project to sequence the 
exomes — the protein-coding regions of the 
genome — of 500 children and adults, looking 
for the genetic causes of intellectual disabilities, 
blindness, deafness and other disorders. Brun- 
ner proposed allowing participants to choose 
from three options: they could learn everything 
that researchers had divined about disease sus- 
ceptibility; just information relevant to the dis- 
ease for which their genomes were examined; 
or no information at all. Ethics reviewers shot 
down his proposal. “They said that in practice, 
it would be impossible for people to draw those 
lines, because people giving consent cannot 
foresee all the possible outcomes of the study,” 
Brunner says. Instead, everyone participating 
in the studies must agree to learn all medically 
relevant information arising from the analysis 
of their genomes. As a consequence, Brunner 
recently had to tell the family of a child with a 
developmental disability that the child also has 
a genetic predisposition to colon cancer. Not all 
researchers endorse the idea of informing chil- 
dren about diseases that might affect them as 
adults. In this case, doctors recommended early 
screening, and Brunner says, “the family han- 
dled it very well; they said, “This is not what we 
anticipated, but it’s useful information”. 

Many of the studies done now ask patients 
to give consent for research linked to particular 
investigators or diseases. But that means that 
researchers cannot pool data from separate 
studies to tackle different research questions. 
Many researchers say that the obvious solu- 
tion is a broad consent document that gives 
researchers free rein with the data. But many 
non-scientists think participants should be able 
to control how their data are used, says lawyer 
Tim Caulfield of the University of Alberta in 
Calgary, Canada, who has surveyed patients 
about this idea. “There’s an emerging consensus 
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within the research community about the need 
to adopt things like broad consent, but that 
hasn't translated out to the legal community or 
to the public,” he says. 

Another solution might be called ‘radical 
honesty. A US project called Consent to 
Research, which aims to provide a large pool of 
user-contributed genomic and health data, has 
devised what it calls a ‘Portable Legal Consent, 
which allows anyone to upload information 
about him or herself, such as direct-to-con- 
sumer genetic results and lab tests ordered 
through medical providers, to an interface that 
strips the data of identifiers. It makes the data 
widely available to researchers under broad 
guidelines, but also requires data donors to go 
through a much more rigorous consent pro- 
cess than most studies do. The Portable Legal 
Consent specifically informs participants that 
researchers might be able to determine their 
identities, but that they are forbidden from 
doing so under the project's terms of use. 

Such approaches could help scientists by 
giving them access to a trove of data with no 
restrictions on use. But the participant pro- 
tections system that is in place might not be 
ready for such frank dialogues, says Angrist, 
who serves on one of Duke’s institutional 
review boards. 

While reviewing a research proposal for a 
large biobank, for example, Angrist suggested 
that the researchers send the participants an 
annual e-mail explaining how their samples 
were being used, and thanking them for donat- 
ing their time and tissue. The review board 
voted this suggestion down after its chair argued 
that e-mailing the patients would create a prob- 
lem in light of the Health Insurance Portability 
and Accountability Act (HIPAA) — the US law 
that guarantees the privacy of health records. 
“The irony is that the HIPAA is supposed to 
protect people, and what I was hearing was, 
“We cant talk to people because we're too busy 
protecting them,” Angrist says. “Institutions use 
informed consent to mitigate their own liabil- 
ity and to tell research participants about all the 
things they cannot have, and all the ways they 
cant be involved. It borders on farcical.” 

But as patient data become more precious 
to researchers, and as advocacy organizations 
become more involved in driving research 
agendas and in funding the work, such pater- 
nalistic attitudes will probably not survive, says 
Terry. She adds that technologies that allow 
research participants to control and track how 
researchers use their data will soon catch on. 
These approaches could benefit patients, who 
gain transparency and control over their data, 
and researchers, who gain access to richer 
data sets. “I think we're going to have to get 
to a place where consenting people becomes 
customizable easily through technology, and 
we're not there yet,” Terry says. m SEE EDITORIALP.293 


Erika Check Hayden writes for Nature from 
San Francisco, California. 
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Summer workshops at the Aspen Center for Physics give researchers respite from their academic duties. 


Aspen physics turns 50 


Michael S. Turner reflects on how mountain serenity has bred big 
breakthroughs at the Aspen Center for Physics in Colorado. 


r Vheoretical physicists are an odd lot: 
bad communicators (Niels Bohr 
and Werner Heisenberg); brilliant 

showmen (Richard Feynman and George 
Gamow); the ‘strangest mar’ (Paul Dirac); 
lots of Hungarians (Leé Szilard, Edward 
Teller and Eugene Wigner); bad hair (Albert 
Einstein); and too few women. They don't 
need fancy equipment — a pencil and paper 
will do. But they do like a serene environ- 
ment, with blackboards and other people of 
their ilk, in which to come up with big ideas: 
among them relativity, the Big Bang, quan- 
tum mechanics and the atomic bomb. 

Over the past 50 years, the Aspen Center 
for Physics (ACP), nestled in a beautiful valley 
at 2,400 metres above sea level in the Colo- 
rado Rocky Mountains, has provided a ‘circle 
of serenity’ during the summer months for 


10,000 theoretical physicists, including 53 
Nobel laureates, from 65 countries. The centre 
can lay claim to the string-theory revolution, 
the birth of the arXiv preprint archive and 
to setting the agenda for condensed-matter 
physics. Its history is tied to the revival ofa sil- 
ver-mining town and the American entrepre- 
neurial spirit, and features a fascinating cast of 
characters, from philosopher Mortimer Adler 
to journalist Hunter S. Thompson. 

The centre's story cannot be separated from 
that of the town. The 1893 repeal of the Sher- 
man Silver Purchase Act demonetized silver 
and almost overnight turned Aspen, with 
a population of about 15,000, into a ghost 
town. Elizabeth Paepcke, wife of Chicago 
industrialist Walter Paepcke, visited in 1939, 
describing it as a place that “had slept since 
1893”. She found 700 residents, decaying 
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Victorian buildings and wonderful skiing. She 
persuaded her husband, a devotee of German 
writer Johann Wolfgang von Goethe, to visit 
in 1945. Seeing it as the ideal place to bring 
together the three aspects of life — economic, 
cultural and physical — he invested millions 
of dollars in rebuilding it. In 1946, he formed 
the Aspen Skiing Corporation, which remains 
the financial engine of the valley. 

Aspen’ cultural transformation came with 
the 1949 Goethe bicentennial. Organized by 
Walter Paepcke (with guidance from Adler 
and Robert Maynard Hutchins, then chan- 
cellor of the University of Chicago in Illi- 
nois), the bicentennial aimed to rehabilitate 
German culture and to revive humanism in 
the wake of the Second World War and the 
dawn of the atomic age. Around 2,000 peo- 
ple gathered in a tent designed by architect 
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> Eero Saarinen for the 20-day celebration. 
They included German-French theologian 
Albert Schweitzer, pianist Artur Rubinstein, 
philosopher José Ortega y Gasset and poet 
Stephen Spender. The event led to the for- 
mation of the Aspen Music Festival (now 
the Aspen Music Festival and School) and, 
in 1950, of the Aspen Institute for Humanis- 
tic Studies (now the Aspen Institute). Just as 
Paepcke had imagined, today the town brings 
together culture, wealth and athleticism — 
and a touch of glitz. 


BEGINNINGS 

The ACP’s origins lie with physicist George 
Stranahan, heir to the fortunes of the Cham- 
pion spark-plug company in California and 
a graduate student at the Carnegie Institute 
of Technology in Pittsburgh, Pennsylvania. 
In the late 1950s, he decided that he would 
rather do his physics during the summer 
months in the mountains of Colorado, 
where fishing and hiking provided a more 
enjoyable backdrop than did an office in 
steamy Pittsburgh. After a few years, he real- 
ized that theoretical physics was best done 
with others, and set out to draw physicists 
to Aspen. When he later moved to Colorado, 
Stranahan became the landlord and close 
friend of Thompson. 

Stranahan got things going with help from 
Michael Cohen, a condensed-matter theorist 
at the University of Pennsylvania in Philadel- 
phia, who was one of Feynman's few PhD stu- 
dents, and Robert Craig, executive director 
of the Aspen Institute. The Stranahan fam- 
ily’s Needmor Fund paid for the first build- 
ing, Stranahan Hall, designed by Bauhaus 
architect Herbert Bayer, who also planned 
the Aspen Institute campus. Cohen found 
the physics talent, and Craig convinced the 
Aspen Institute to create a physics division. 

In spring 1962, a letter was sent out to the 
physics community tentatively announcing 
“the possibility of a summer physics insti- 
tute”. The purpose was “to provide a place 
for physicists to work on their own problems 
during the summer, in a stimulating physics 
atmosphere, and in a location with pleasant 
surroundings and natural beauty”. That year, 
42 brave souls came to Aspen to “pursue 
their work with minimal distractions”. 

The Aspen formula was — and still is — to 
bring the best theorists together in an infor- 
mal setting for weeks or months, free from 
their usual responsibilities of students and 
teaching, and isolated from distractions. 
There, they could talk with one another, 
think big thoughts and come up with game- 
changing ideas. Physicists were housed two 
to an office and held discussions on a patio 
with a small blackboard, often accompanied 
by beautiful music from the town’s music tent. 
Graduate students were excluded, differenti- 
ating Aspen from teaching summer schools. 
For many years, the buildings had only a 
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handful of public phones. This puta limit on 
interruptions, but sometimes provided enter- 
tainment. I once overheard particle physicist 
Murray Gell-Mann of the California Institute 
of Technology quipping, “I don’t know the 
English for it, but the Japanese is ...”. 

An early attempt to bring together physi- 
cists and philosophers failed because of 
Adler's insistence that they agree on “the pyra- 
mid of knowledge’, which had physics at the 
bottom and philosophy on top. Because of the 
clash of cultures and egos, the centre did not 
stay tied to the Aspen Institute for long and 
became an independent entity in 1968. Since 
then, the ACP has been run by physicists who 
volunteer their time, helped by just two full- 
time staff. More than 200 top theorists have 
shaped and guided the centre — including 
five Nobel laureates and Stephen Hawking. 

An early grant from the Sloan Founda- 
tion was crucial, and Hans Bethe of Cornell 
University in New York donated part of his 
1967 Nobel prize money. Bethe Hall, built in 
1978, was named in his honour. Robert Rath- 
bun Wilson, the first director and builder of 
Fermilab in Batavia, Illinois, visited Aspen 
in 1967 and convinced the US Department 
of Energy to build a large, temporary office 
building there, where Fermilab’s facilities for 
particle-physics experiments were designed. 
The construction of Hilbert Hall, named 
after mathematician David Hilbert, almost 
tripled the number of physicists that the cen- 
tre could accommodate. 


SOLID FOUNDATIONS 

In 1972, the US National Science Foundation 
became the ACP’s main funder, with support 
from other US science agencies including 
the Department of Energy and NASA. In 
the mid-1990s, a $3-million fund-raising 
campaign led by astrophysicist David Sch- 
ramm of the University of Chicago financed 
the final and largest building, Smart Hall. 


George Stranahan, Michael Cohen and Robert 
Craig (left to right): the centre’s three founders. 


© 2012 Macmillan Publishers Limited. All rights reserved 


Contributions came from physicists, friends 
in the Aspen community and the Smart 
Family Foundation in Connecticut. 

Three figures played a major part in estab- 
lishing the ACP in the theoretical commu- 
nity: Philip Anderson of Princeton University, 
Bethe and Gell-Mann. Coincidentally, they 
all began coming to Aspen two years before 
receiving a Nobel prize. They set the agenda, 
served as scientific magnets and gave early 
legitimacy. Any high-energy theorist would 
kill to spend three weeks discussing physics 
with Gell-Mann; Bethe helped to get astro- 
physics going at the ACP; and Anderson 
shaped condensed-matter physics there for 
three decades. 

Anderson set the tone for the condensed- 
matter field with his influential paper ‘More 
is Different’ (P. W. Anderson Science 177, 
393-396; 1972). Contrary to particle phys- 
ics, in which scientists pursue a reductionist 
quest for simplicity at smaller and smaller 
scales, condensed-matter physics applies the 
basic rules to discover and study the often 
unexpected, emergent phenomena that arise 
in large systems with complicated interac- 
tions, such as superconductivity or biologi- 
cal systems. Today, biological physics has 
emerged as a major activity at the ACP. 

Two other condensed-matter theorists 
played a crucial part: David Pines of the 
University of Illinois at Urbana-Champaign 
and Elihu Abrahams of Rutgers University 
in Piscataway, New Jersey. They pioneered 
workshops on the latest topics to attract a 
balance of researchers from universities and 
from industry (mostly Bell Labs). These 
workshops brought in young hotshots, keep- 
ing the talent pool fresh. One area of scien- 
tific focus, strongly correlated electrons in 
metals, laid the foundations for the current 
understanding of high-temperature and 
other unconventional superconductors. 

In addition to Bethe’s presence, astro- 
physics at the ACP was jump-started by the 
discovery in 1967 of pulsars and their identi- 
fication as neutron stars. The exotic proper- 
ties of pulsars — rapid rotation, superfluidity 
and superconductivity — intrigued Pines 
and other condensed-matter theorists. They 
brought in astrophysicists with expertise in 
relativity and nuclear physics, and work 
done at the centre linked pulsar glitches to 
superfluidity within neutron stars, advanc- 
ing both fields. In 1972, NASA started fund- 
ing an annual workshop, and astrophysics 
had a foothold in Aspen. 

But it was cosmology that caused astro- 
physics to rise to the same level as particle 
physics and condensed matter. Around 
1980, Schramm and others began to realize 
that theories of unification in particle phys- 
ics might revolutionize the sleepy field of 
cosmology, which had been the province of 
astronomers since the time of Edwin Hub- 
ble. Aspen was the ideal incubator for this 
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Aspen’s cultural shift started with big-top celebrations for Johann von Goethe’s bicentennial in 1949. 


young, interdisciplinary field. Workshops 
brought together astronomers and physicists 
to discuss the hot topics — Big Bang nucleo- 
synthesis, dark matter, inflation, large-scale 
structure, the cosmic microwave background 
and cosmic strings. 


HIGH IMPACT 

A staggering 10,000 or more papers are 
attributed to visits to the ACP. But its real 
impact is the big ideas that originated there. 
Much of today’s consensus cosmology, with 
its particle dark matter, inflationary origins 
and dark energy, can trace its roots to the 
ACP. In his Nobel prize acceptance speech 
last December, Adam Riess of Johns Hop- 
kins University described how his team, 
which co-discovered that the expansion of 
the Universe is speeding up, regularly met 
at the centre to chart its activities. 

Whether or not string theory is the theory 
of everything, it has changed the course of 
physics. It began as a way to describe the 
strong interactions between neutrons, pro- 
tons and related particles. Supersymmetry, 
the symmetry between bosons and fermions 
and a hallmark of today’s string theory, traces 
its origins to theorist Pierre Ramond’ first 
summer in Aspen in 1970, where, as he put it, 
he “stopped calculating and started thinking” 
When he got back to Fermilab, he prepared 
the paper that added supersymmetry to string 
theory. This has become the pattern: thinkin 
Aspen, calculate and write at home. 

String theory was declared dead at a 
1974 Aspen workshop, having been beaten 
by quantum chromodynamics as the best 
description of the strong (colour) interac- 
tions between the quark constituents of the 
hadrons. But John Schwarz of the Califor- 
nia Institute of Technology decided to be 
bold and think bigger, touting strings as 
the path to unifying the forces of the suba- 
tomic world with gravity. For the next ten 
years, Schwarz, his collaborator Michael 
Green and a handful of others tried to make 


good on this promise at the centre. In the 
summer of 1984, their breakthrough came 
with a historic paper that showed the 
mathematical consistency of string theory 
(technically the cancellation of anomalies), 
triggering the first string-theory revolution. 

The Green-Schwarz discovery was 
announced immediately in grand fashion 
during a ‘physics cabaret’ at Aspen’s historic 
Hotel Jerome. Ina skit, Schwarz, playing the 
role of Gell-Mann, rushed onto the stage to 
announce that he had discovered the theory 
of everything — and was eventually carried 
off the stage by a man in a white coat. 

Another revolution, in physics publishing, 
traces its origin to a chance encounter on an 
ACP bench in June 1991. Joanne Cohn, a 
young theorist at the Institute for Advanced 
Study in Princeton, had been running an 
informal preprint distribution service, 
e-mailing papers to hundreds of string theo- 
rists who wanted to get the latest results as 
quickly as possible. Paul Ginsparg, then at 
the Los Alamos National Laboratory, asked 
her why she hadn't automated the system. 
By the next day, Ginsparg had written some 
scripts, and two months later the Los Alamos 
arXiv (now residing at Cornell University) 
was opened for business. Today, more than 
1 million articles are downloaded every week. 

Physicists have always been attracted to 
mountains, to hike and to think. ACP co- 
founder Robert Craig was a world-class 
mountaineer who scaled some of the toughest 
and tallest mountains, including K2 in Asia, 
and many physicists who came to Aspen were 
serious climbers. A topographical map of the 
surrounding mountains is displayed promi- 
nently in Stranahan Hall. Aspen’s combi- 
nation of challenging hikes, unpredictable 
mountain weather and crumbling rock means 
that tragedies are not uncommon. 

In 1988, ACP trustee Heinz Pagels, execu- 
tive director of the New York Academy of 
Sciences, slipped on a loose stone while 
climbing on Pyramid Peak in Colorado and 
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fell to his death. A respected popularizer of 
physics as well as a brilliant theoretical phys- 
icist, Pagels’ name was given to the centre's 
summer public lecture series. 

Schramm, my mentor, was also an expert 
climber. One summer, bad weather trapped 
him and his climbing partner on the face 
of Colorado's Capitol Peak in freezing rain; 
they made it back to Aspen two days later, 
after many had given up hope. Sadly, in 
December 1997, Schramm died in a plane 
crash while flying his twin-engine turboprop 
from Denver to Aspen. 


THE FUTURE 

Today, the Aspen Center for Physics is thriv- 
ing ona 1.6-hectare campus, surrounded by 
a large ‘circle of serenity’ of open space on 
the Aspen Meadows. The Aspen Institute, 
the Aspen Music Festival and School and the 
ACP are recognized as the town’s three major 
cultural institutions. 

Each summer, the ACP’s 16-week pro- 
gramme and 10-15 workshops attract more 
than 500 leading theorists to work on the 
most important problems in physics. With 
delight, I note that about 20% of the attend- 
ees and members of the governing board are 
now women, and that the centre now has its 
first female president — progress for theo- 
retical physics. Discussions fill the offices, 
halls, alcoves and patios; spontaneous vol- 
leyball games occur regularly. New ideas and 
collaborations made in this informal envi- 
ronment have launched the careers of hun- 
dreds of young theorists (myself included) 
and moved physics forwards. 

Will the Aspen formula continue to be as 
successful as it has been for the past 50 years? 
Several theoretical-physics institutes now 
exist — the Isaac Newton Institute in Cam- 
bridge, UK, and the Kavli Institutes in Santa 
Barbara and Beijing. These have longer, more 
formal programmes, and lack the serenity of 
Aspen. The bigger challenge comes with the 
change in the way science is done today. It 
is more collaborative, more connected and 
‘more faster. When the ACP was founded, 
collaboration required face-to-face inter- 
action. Now, with e-mail and the Internet, 
many collaborators have never met in person. 

Aspen continues to be a place to think, 
free from the constraints of everyday exist- 
ence — as Paepcke said, a place for “lifting us 
out of our usual lives” In today’s fast- paced 
world of science, the need for a circle of 
serenity is only more acute. m 


Michael S. Turner is director of the Kavli 
Institute for Cosmological Physics at the 
University of Chicago, Illinois 60637, USA. 
He is chair of the trustees of the Aspen Center 

for Physics, which he has attended since 1979. 
e-mail: mturner@kicp.uchicago.edu 


For more on 50th anniversary events at the Aspen 
Center for Physics, see go.nature.com/lepwz8. 
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Albert Schatz (left) felt that he deserved a share of the Nobel prize awarded to Selman Waksman (right). 


NOBEL PRIZE 


A dark edge to 


the glory 


An examination of the battles behind the prestige of top 


awards grips Hidde Ploegh. 


Tie who enjoy making predictions 
ahead of each year’s Nobel announce- 
ments, only to criticize the winners 
when their choices lose out, may find satis- 
faction in Prize Fight. 

Two stories of contested Nobel prizes drive 
the narrative in this readable book by the 
distinguished radiologist Morton Meyers. 
Meyers succeeds in chronicling these events 
without being overtly partisan, and includes 
a few vignettes of well-known and not-so- 
ennobling kerfuffles, with elliptical refer- 
ences to current events. Finally, Meyers 
discusses the general and fascinating ethical, 
professional and philosophical issues that 
arise from the prize fights. 

Meyers first tackles biochemist Selman 
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Waksman’s 1943 discovery of streptomy- 
cin, and the ensuing altercation with Albert 
Schatz about attribution, intellectual prop- 
erty and commercial exploitation of this 
early antibiotic. Waksman — who developed 
soil microbiology and recognized that actin- 
omycetes bacteria are nature’s drug store for 
antibiotics — assigned to Schatz, his gradu- 
ate student, the search for compounds active 
against Gram-negative bacteria. Schatz 
soon came up with Streptomyces griseus as 
a source of streptomycin. 

This drug, the lucrative patent for which 
lists both Schatz and Waksman as its inven- 
tors, turned out to be highly active against 
Mycobacterium tuberculosis. It was respon- 
sible for remarkable therapeutic successes 


© 2012 Macmillan Publishers Limited. All rights reserved 


until the emergence = 
of streptomycin- 
resistant strains and 
isoniazid, an anti- 
biotic used in combi- 
nation therapy, in the 
early 1950s. 

Drawing on histori- 
cal records and inter- 
views with Schatz’s 
widow, the author 
lays out a plausible 
and fascinating story. 
The failure to com- 


Prize Fight: The 
Race and the 
Rivalry to be the 
First in Science. 
MORTON MEYERS 


Aenea manage eos tae 
expectations led toa £1 699/927. 


painful clash between 

Schatz and Waksman. A culture that viewed 
the appropriation of subordinates’ discover- 
ies by laboratory chiefs as legitimate deep- 
ened the chasm. 

Schatz’s attempts to redress the perceived 
injustice — including a lawsuit against 
Waksman and Rutgers University in New 
Brunswick, New Jersey, later settled out of 
court — probably excluded him from aca- 
demic positions to which he was entitled 
by talent and previous accomplishment. 
Reclamations of this type inevitably invite 
the opprobrium of the larger scientific 
community, not unlike the fate of many a 
whistle-blower similarly castigated for 
attempting to right a wrong. On the fiftieth 
anniversary of streptomycin’s discovery, 
however, Rutgers finally awarded Schatz its 
highest decoration: the Rutgers University 
Medal. 

Meyers’s second story deals with the 
development of nuclear magnetic reso- 
nance (NMR) imaging as a tool in medical 
diagnostics, featuring physician Raymond 
Damadian and his battle with the late Paul 
Lauterbur, a chemist. This is the livelier por- 
tion of the book. Meyers — as a radiologist, 
effectively gardening in his own backyard — 
had the added benefit of interviewing both 
Damadian and Lauterbur. 

With physicist Peter Mansfield, Lauterbur 
in 2003 bested Damadian to Nobel recogni- 
tion of magnetic resonance imaging (MRI). 
This outcome prompted the appearance 
of full-page ads financed by the Friends of 
Raymond Damadian in a number of news- 
papers, including The New York Times. 

As Meyers shows, Damadian was the first 
to suggest using NMR in imaging to distin- 
guish healthy from cancerous tissues. But 
he could not claim to have made the essen- 
tial steps leading to wider practicability of 
the method. He built a massive machine, 
‘Indomitable; a prototype of which included 
a permanent magnet, as well as cardboard 
and copper foil components. Damadian 
used this monster to produce a low-resolu- 
tion image of postdoc Larry Minkoff’s chest 
— which was, indeed, a first. However, many 


of the financial dividends from Damadian’s 
inventions accrued from patent infringe- 
ment suits, not from building commercially 
successful instruments. Lauterbur’s and 
Mansfield’s contributions were what ulti- 
mately brought the technique to diagnostic 
practice. 

More generally, Meyers uses Prize Fight 
to muse on the obsession with awards and 
publication in top-flight periodicals, which 
can ultimately devalue the passion and inge- 
nuity of so many who will never share that 
limelight. External validation by one's peers 
is an important, but not the sole, driver of 
ambition. Taken to pathological extremes, 
the drive to satisfy this need fuels unethical 
behaviour and scientific misconduct. Those 
victimized by it may bear permanent scars 
or decide to leave science entirely. Milder 
cases abound. Disagreements and fights over 
authorship, priority and recognition occur 
wherever science is practised. 

Meyers emphasizes the human nature of 
scientific pursuit. He reminds us that the 
individual scientist’s contribution is ephem- 
eral: the field moves on. Moreover, every 
scientist who assembles and leads a team 
of graduate students, postdocs and techni- 
cians has to contend with issues of priority, 
authorship and, less frequently, assignment 
of intellectual-property rights. The book has 
yet to be written that lays out such points of 
friction for the kinds of research that will 
never be recognized by the Nobel commit- 
tee, yet drive entire disciplines relentlessly 

forward — a cat- 


“Obsession egory into which 
with awards most of the science 
can devalue the that has a positive 
passion and impact on society 
ingenuity of so probably falls. 

many who will In the airline 
never share that — industry and med- 
limelight. i icine, checklists 


have become an 
essential step in preventing human error, 
whether in tightening bolts or measuring 
drug dosages. In the coda to Prize Fight, 
Meyers provides a checklist of sorts on how 
to avoid landing oneself in the intellectual 
and emotional morass that permanently 
colours the outlook of deserving but unrec- 
ognized scientists. 

This starts with being aware of the prob- 
lem, making an effort to be consistent in 
attribution of authorship, and setting crite- 
ria for establishing credit. Easier said than 
done: these issues are unlikely to disappear 
any time soon. But we ignore them at the risk 
of creating toxic working environments. = 


Hidde Ploegh is professor of biology at 

the Massachusetts Institute of Technology 
and the Whitehead Institute for Biomedical 
Research in Cambridge, Massachusetts. 
e-mail: ploegh@wi.mit.edu 
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SuperFuel: Thorium, the Green Energy Source for the Future 
Richard Martin PALGRAVE MACMILLAN 272 pp. £18.99 (2012) 
Post-Fukushima, uranium-powered plants face being phased out in 
many countries. But there is a nuclear alternative, argues clean- 
energy-research analyst Richard Martin: thorium. Less volatile 

than uranium, four times as abundant, energy-dense and efficient, 
thorium has major potential, not least because liquid fluoride thorium 
reactors create no nuclear waste. Martin’s investigation reveals 

how the technology, developed at Oak Ridge National Laboratory in 
Tennessee, was dropped by President Richard Nixon in 1972 — and 
how interest is now picking up in China, India and elsewhere. 


Born Together — Reared Apart 


e 3 Nancy L. Segal HARVARD UNIVERSITY PRESS 416 pp. £36.95 (2012) 
% <= The ‘Jim twins’ constituted a watershed in the nature—-nurture 
a | debate. When Jim Lewis and Jim Springer — twins separated at four 
Ee months — were reunited at 39, both were found to have loved maths, 
worked as sheriffs and practised carpentry, among other startling 
£ g parallels. The case underlined the importance of genetics and led to 
ae the Minnesota Study of Twins Reared Apart. In this inclusive overview, 


Nancy Segal, director of the Twin Studies Center at California State 
University, Fullerton, examines the study that turned ideas on 
parenting, teaching, health and sexual orientation upside down. 


Full Body Burden: Growing Up in the Shadow of a Secret Nuclear 
Facility 

Kristen Iversen HARVILL SECKER 416 pp. £14.99 (2012) 

For years, Kristen lversen’s mother thought that the industrial 
complex in their small Colorado town manufactured cleaning 
agents. But this was Rocky Flats — the US government facility where 
the plutonium ‘pits’ of nuclear weapons were manufactured. And, as 
Iversen reveals, it was plagued by safety issues. Among the appalling 
twists in this tale are high levels of testicular cancer among teenage 
boys in the area. After an inter-agency raid in 1989, pit production 
ceased; but Rocky Flats makes for a story with a long half-life. 


Picturing the Book of Nature: Image, Text, and Argument in 
Sixteenth-Century Human Anatomy and Medical Botany 

Sachiko Kusukawa UNIVERSITY OF CHICAGO PRESS 304 pp. £29 (2012) 
Science historian Sachiko Kusukawa probes the role of illustration 

in sixteenth-century medical treatises, before the advent of the 
microscope. Looking at Leonhart Fuch’s De historia stirpium, 
Vesalius’s De humani corporis fabrica and the unpublished Historia 
plantarum of Conrad Gessner, Kusukawa argues that such anatomical 
and botanic images were not simply records of natural phenomena, 
but varied visual experiments. His book is studded with illustrative 
gems, not least John Dee’s ‘pop-up’ pyramids in Of Euclid’s Elements. 


Psychology in the Bathroom 

Nick Haslam PALGRAVE MACMILLAN 184 pp. £50 (2012) 

Arcane sexual behaviours are the stuff of cocktail-party chat, whereas 

the “psychology of flatulence” and incontinence remain taboo. 

Psychologist Nick Haslam eases open the bathroom door on the 

- many human behaviours associated with excretion. Drawing on 
clinical research, psychoanalytical theory, language, gender and 
more, he conducts a fascinating neurogastroenterological journey, 

j from scatological slang and toilet graffiti to the psychological aspects 
4 of constipation and diarrhoea. ‘Toilet reading’ of a high order. 


Mh Hone, 
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Birds such as golden eagles eat carrion, recycling the nutrients captured during an animal’s lifetime. 


ANIMAL BEHAVIOUR 


Dissecting decay 


Clive D. L. Wynne celebrates a lively exploration of the 
life-and-death cycle in the wild. 


espite focusing on death and decay, 
D Life Everlasting is far from morbid; 

instead, it is life-affirming. Bernd 
Heinrich, emeritus professor of biology at 
the University of Vermont in Burlington, 
does a tremendous job of convincing the 
reader that physical demise is not an end to 
life, but an opportunity for renewal. 

He was prompted to ponder mortality by 
an odd request from a friend, who asked if, 
when he dies, his corpse might be left out for 
the ravens on Heinrich’s land in Maine. That 
set Heinrich off on a quest to understand the 
role of death in life. His journey in this book 
starts and ends in, and often returns to, his 
beautiful Maine woodland, where nature can 
be observed at close range, and occasionally 
friends show up with elderberry wine and 
guitars. 

Heinrich opens with a succession of ani- 
mal corpses that he watches being buried by 
beetles, colonized by maggots, hauled off by 
ravens and vultures, and putrefied by bacteria. 
He starts small, with a deceased mouse, and 
proceeds through ever larger bodies: a freshly 
killed squirrel, a rooster, deer and pig, and 
finally a massive bull moose. Most of these he 
places so that he can watch their decay com- 
fortably from a chair in his cabin. Each corpse 
provides opportunities to describe, in loving 
detail, the life that death provides. 

Beetles with beautiful wings use the dead 
mouse as a romantic meeting place. Having 
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found each other, they = 
carry the carcass to a 
safe place for burial. 
The female lays her 
eggs in soil close by. 
When the larvae hatch, 
the parents feed them 
regurgitated food from 
the corpse, staving off 
decay with antibiotic 


Life Everlasting: 


anal secretions. The Animal Way 
Maggots prefer the f Death 
BERND HEINRICH 


deer carcass. Being 
larger, it stays warm 
for longer and encour- 
ages bacteria, whose 
“soupy by-products” provide sustenance for 
maggots. The maggots quickly colonize the 
corpse, and, because of their high metabolic 
rates, actually raise the temperature inside it 
and accelerate their own growth rate, creating 
a frenzied positive-feedback loop. 

Looking at marine life, Heinrich contem- 
plates the death of salmon and the disman- 
tling of whales. When a whale carcass comes 
to rest on the ocean floor, it may be kilometres 
deep, in total darkness and at temperatures 
very close to freezing. Nonetheless, there is no 
shortage of specialist scavengers even in these 
extreme conditions, including sleeper sharks, 
hagfish and a wealth of tiny crustaceans. 

Heinrich also reminisces about his time in 
Africa during the 1970s. In these time travels 
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he considers the biggest of land beasts — the 
elephant. This animal’s death brings humans 
into the picture: Heinrich argues that our 
ancestors were to elephants “what the sleeper 
sharks and hagfish are to the whales of the 
ocean depths: the ultimate recyclers”. 

Discussion of starving people in Zimbabwe 
today hacking at an elephant carcass segues 
into consideration of how early humans 
would have had to learn to cooperate to bring 
down mammoths, armed only with sharp- 
ened sticks and stones. Heinrich suggests that 
learning to process elephant meat taught us 
so much that tackling any other animal left us 
unfazed. He even details the decomposition 
of elephant dung, recounting with relish the 
time he watched halfa litre of it be colonized 
by some 3,800 beetles in just 15 minutes. 

Heinrich also considers the death of plants. 
When a tree falls in the forest and no one is 
around to take the wood away, what becomes 
of all that timber? In a gripping section on 
“plant undertakers”, he considers how dead 
plants are broken down and the nutrients cap- 
tured during their lives recycled. This process 
is led by sawyer beetles, jewel beetles and bark 
beetles, followed by horntail wasps, countless 
other beetles, fungi, birds such as woodpeck- 
ers, centipedes, millipedes and, ifthe tree falls 
into water, fish. Even other trees may use a 
rotting log as a base from which to grow. 

Heinrich argues passionately that we can- 
not and should not fight the return of life 
to life through death. Death is the ultimate 
recycler. In the United States, most people 
choose to have their corpse either pumped 
with toxic formaldehyde and sealed in a steel 
box, or incinerated. The incineration of bod- 
ies in the United States burns enough fuel 
each year to power 80 trips to the Moon. In 
this way, we perpetuate in death the exclu- 
sion of ourselves from the natural world that 
many of us also proclaim in life. 

Extinction, not death, is the real problem. 
Heinrich notes that unloved “undertaker” 
species such as ravens, vultures and condors 
are especially vulnerable to humans ’ effects 
on ecosystems. The largest cause of loss for 
such species is the complete replacement of 
large wild animals with farming activities that 
displace habitat and take away carcasses that 
nature would have left to return to the earth. 

Ultimately, Heinrich is unable to indulge 
his friend’s desire for a “green burial” In the 
United States, the movement and disposal of 
dead bodies is tightly bound by legal restric- 
tions. He succeeds, however, in a larger aim. 
He replaces the inanimate biblical bookends 
to our lives, ‘dust to dust, with what we really 
are at death: conduits for life. Ifnature has 
its way, the real progression is ‘life to life: = 


Clive Wynne is a professor of psychology 
at the University of Florida, Gainesville, 
Florida 32611, USA. 

e-mail: wynne@ufl.edu 


MATHEMATICS 


A life computed 


James Poskett navigates a sophisticated account of 
Alan Turing’s extraordinarily varied intellectual world. 


lan Turing did not invent the com- 
Az During the 1930s, well before 

the Manchester “Baby, Pilot ACE 
or EDVAC machines, thousands were in 
operation all across Britain. These ‘com- 
puters’ were women, working in teams and 
each performing a discrete step of a complex 
mathematical operation. 

Employed by the Scientific Comput- 
ing Service (SCS) in the United Kingdom, 
these doughty women solved problems in 
everything from X-ray crystallography to 
jet-engine design. That world of ‘computers 
before computers’ is featured in the open- 
ing gallery of Codebreaker, the celebration of 
mathematician Alan Turing’s life and legacy 
at London's Science Museum. 

Although the SCS may seem far removed 
from Turing’s world, knowledge of its work 
can help us to make sense of his seminal 
1936 article ‘On computable numbers, with 
an application to the Entscheidungsproblem’. 
This is viewed by some as the origin of the 
concepts behind the modern computer. But 
the real story is more complex. 

Turing didn't pluck the idea of the modern 
computer out of thin air. He took the idea ofa 
team of human computers working together 


and abstracted it, Codebreaker— 

imagining a universal Alan Turing’s life 

computing machine and legacy 

that could take on The Science Museum, 
carey London 

all of the individual (21 June 2012- 

tasks allocated to the — 37 July 2013) 


women in the SCS. 

This was merely a thought experiment for 
him at first — an aid for approaching David 
Hilbert’s notorious ‘decision problem on the 
question of whether an algorithm exists for 
deciding ifa given mathematical statement 
has a proof or not. 

Interestingly, although sometimes armed 
with little more than a pencil and slide rule, 
the women working for the SCS also used 
basic calculating machines — one of which 
sits in the first room of Codebreaker. These 
typewriter-like contraptions, covered in 
strips of red and white buttons, helped to 
speed up ballistics calculations crucial to 
the war effort. 

The parade of objects that follows reveals 
Turing’s impact, as well as his influences, 
leading the visitor through a vast intellec- 
tual landscape, from aeronautical design and 
biochemistry to cryptography and artificial 
intelligence. 

In one gallery, a German Enigma 
machine, tucked into its neat wooden 
case, sits among archival photos that 
evoke the atmosphere of Second World 

War intelligence. Turing is well known to 

have worked in wartime cryptanalysis at 

the Government Code & Cipher School 
at Bletchley Park, UK. But he did not 
operate in isolation, collaborating with 

Gordon Welchman on the design of the 

electromechanical machines used to crack 
the code. These devices, each of which 
could mimic the action of several Enigma 
machines, in turn originated in the ear- 
lier work of Polish cryptanalysts such as 
Marian Rejewski. 


Alan Turing worked on devices to crack the German Enigma machine’s code during the Second World War. 
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Ina nearby gallery, a twisted white metal 
fuselage serves as a poignant reminder of 
the part Turing’s ideas played in develop- 
ing safer air travel after the war. In 1954, the 
world’s first commercial jet airliner, the de 
Havilland Comet registered ‘Yoke Peter’, 
exploded in mid-air, killing everyone on 
board — and prompted the Royal Aircraft 
Establishment at Farnborough, UK, to find 
the cause. 

By this time, Turing’s abstract idea of a 
universal computing machine had become a 
reality. The National Physical Laboratory in 
Teddington, UK, had in 1950 completed the 
Pilot ACE, an electronic computer designed 
to one of Turing’s first practical specifica- 
tions. For a short time, this computer was 
the fastest in the world. The huge rack of 
wires, relays and coloured transistors is dis- 
played alongside the Yoke Peter wreckage. 
It helped to process the enormous amounts 
of data needed to complete detailed analy- 
sis of the debris, eventually revealing the 
point of structural weakness and prompting 
improvements in the design and manufac- 
ture of de Havilland jets. 

The exhibition also explores links between 
the Pilot ACE and Dorothy Hodgkin's work 
on the structure of vitamin B,,. One of her 
original models of the vitamin, an intricate 
web of more than 100 red and blue balls, 
is on show next to the story of her use of 
computers. Cryptography and X-ray crystal- 
lography had much in common at this time: 
each involved recovering information from 
a scrambled signal. 

Hodgkin's problem was that the images 
she produced indicated only the amplitudes 
of the diffracted waves. To establish the cor- 
rect structure of a molecule, she needed to 
churn through an enormous number of pos- 
sible permutations of different wave phases. 
So she enlisted the help of both the SCS and 
the Pilot ACE, going on to win the Nobel 
Prize in Chemistry for her work in 1964. 

Turing’s relationships with artificial intel- 
ligence and developmental biology are also 
on show through video interviews with 
contemporary mathematical biologists run 
next to programmable robotic tortoises. 
Although Turing didn’t work on these 
machines himself, he was fascinated by the 
possibility of an artificial mind, coming 
to London especially to see robots such as 
these scurry across the floor at the Festival 
of Britain in 1951. 

Codebreaker does an impressive job of 
bringing these diverse histories together. 
Turing is rightly celebrated, not as a lone 
genius, but as an impressive intellect and 
brilliant collaborator. = 


James Poskett is a science writer based in 
Cambridge, UK, specializing in the history 
and philosophy of science. 

e-mail: james.poskett@cantab.net 
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More ways to govern 
geoengineering 


You call for stronger governance 
of climate-mitigation strategies 
that reflect the Sun’s energy 
away from Earth (Nature 485, 
415; 2012). We see the scientists’ 
cancellation of a controversial 
field trial for the UK 
Stratospheric Particle Injection 
for Climate Engineering 
(SPICE) project (Nature 485, 
429; 2012) as responsible self- 
governance in the absence of the 
governmental oversight that is 
needed for solar geoengineering 
research. 

The decision to cancel the 
SPICE balloon experiment can 
advance norms for research 
priorities and conditions of 
research. The underlying 
governance principles have 
been articulated by the 
Bipartisan Policy Center's 
Task Force on Climate 
Remediation Research and the 
Solar Radiation Management 
Governance Initiative (SRMGI), 
sponsored by the UK Royal 
Society, the Environmental 
Defense Fund and the Academy 
of Sciences for the Developing 
World (TWAS). 

On the basis of the SPICE 
example, scientists can now 
decide — through projects, 
workshops and professional 
societies — that there should 
be no immediate research 
into deployment methods for 
geoengineering technologies, 
and that they will not engage in 
research that has intellectual- 
property implications. They can 
also learn from SPICE about 
public engagement and ways to 
make research transparent. 

Eventually, legitimate 
governance must grow out 
of consultations with diverse 
constituencies (such as those 
sponsored by the SRMGI) 
and needs to come from 
governmental institutions that 
are fully accountable to society. 
Jane C. S. Long The Bipartisan 
Policy Center, Washington DC, 
USA. janecslong@gmail.com 
Steve Hamburg Environmental 


Defense Fund, Washington DC, 
USA. 

John Shepherd University of 
Southampton, UK. 


Use fast reactors to 
burn plutonium 


Frank von Hippel and colleagues 
review some disposal options 
for radioactive plutonium waste 
(Nature 485, 167-168; 2012). 
Another option is the profitable 
consumption of plutonium from 
thermal nuclear plants in a fast- 
spectrum breeder reactor with 
fuel recycling. 

A prototype Integral Fast 
Reactor was operated at 
the Argonne West National 
Laboratory in Idaho for 30 years 
until 1994. ‘Burning’ spent 
nuclear fuel produces a fraction 
of the waste of current reactors, 
and it has low radiotoxicity 
(W.H. Hannum (ed.) Prog. Nucl. 
Energy 31, 1-217; 1997). 

The reactor’s metal fuel 
(mainly uranium, plutonium and 
zirconium) and liquid-sodium 
coolant provide passive safety. 
An unpressurized pool vessel 
disperses decay heat by natural 
convection, even when cooling 
pumps are inoperable and the 
heat sink is lost. 

The fuel-recycling system 
generates vast amounts of 
clean electricity, extending 
uranium supplies 150-fold — 
unlike today’s once-through- 
and-throw-away cycle. Its 
proliferation risk is low because 
the products are unsuitable for 
use in fissile weapons. 

The company GE Hitachi has 
designed an integral fast reactor, 
the 311-megawatt electric 
Power Reactor Innovative 
Small Module (PRISM), that is 
intended for commercial use. 

A prototype plant is already 
being considered in the United 
States, and the company has 
recommended these plants to the 
UK government for plutonium 
disposal (see go.nature.com/ 
dwiqvg). 

Barry W. Brook University of 
Adelaide, Australia. 


barry. brook@adelaide.edu.au 
Tom Blees, William H. 
Hannum Science Council for 
Global Initiatives, Woodland, 
California, USA. 


Spread the risk of 
antibiotic research 


There are commercial as well 
as scientific barriers to seeking 
out new antibiotics (Nature 
485, 439-440; 2012). These 
discourage the pharmaceutical 
industry from investing in 
further research and clinical 
development, particularly as 
current antibiotics are cheap, 
usually work satisfactorily 
and generate profits for their 
manufacturers and distributors. 
Producing new antibiotics is 
costly, particularly at the clinical- 
trial stage because of concerns 
over safety and resistance. 
Intellectual property is an issue 
in developing new drugs based 
on established antibiotic classes. 
We could wait until the clinical 
situation becomes severe enough 
for the private sector to step in 
with substantial investment, but 
that would be risky. A mix of 
private and public investment 
might work, particularly if the 
public sector were to take on 
some of the commercial risk 
in clinical trials (see Nature 
472, 32; 2011). A pragmatic 
strategy would be to define the 
clinical profiles of desirable new 
antibiotics and then to devise 
commercially viable routes for 
delivering them to the clinic. 
Chris Schofield University of 
Oxford, UK. 
christopher.schofield@chem. 
ox.ac.uk 


Monitor sea pollution 
to stop strandings 


Hundreds of small cetaceans 
were stranded along Peru's 
northern coast earlier this 

year. While the event is 

under investigation, Peru's 
government should be setting up 
programmes to monitor marine 


pollution and taking precautions 
to protect the coastal ecosystem. 

The dead animals comprised 
mainly long-beaked common 
dolphins (Delphinus capensis) 
and Burmeister’s porpoises 
(Phocoena spinipinnis). They 
had internal trauma and lesions 
that could have been caused by 
underwater noise effects (see 
go.nature.com/tbfi7n). Although 
military sonar is known to 
induce cetacean strandings, 
no naval exercises had been 
reported in the area. Neither had 
there been any seismic testing 
associated with gas and oil 
exploration, which can also bea 
contributor. 

Persistent pollutants that 
accumulate in cetaceans could be 
a factor. These weaken cetacean 
immune systems, making them 
more susceptible to infection 
(P.Ross Hum. Ecol. Risk Assess. 
8, 277-292; 2002), exacerbated 
by food shortages during El Nifio 
episodes and harmful algal 
blooms. 

Juan Jose Alava Simon Fraser 
University, Burnaby, Canada. 
jalavasa@sfu.ca 


China must provide 
education on HIV 


Cultural factors spanning 
5,000 years force homosexual 
men in China to endure huge 
psychological and social 
pressures (H. Shang et al. 
Nature 485, 576-577; 2012). 
Government authorities need to 
face up to reality and promote 
HIV research and public 
education about transmission of 
the virus to curb its spread. 
Alongside the promotion 
campaigns of World AIDS Day, 
China should provide more ways 
to access up-to-date information 
on prevention and treatment 
of HIV infection. Infected 
people also need proper medical 
assistance, legal safeguards and 
humane care. 
Jian Zhang, Nan Jiang South 
China University of Technology, 
Guangzhou, China. 
zhangjian3954@126.com 
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OBITUARY 


Akira Tonomura 


(1942-2012) 


Physicist who pioneered electron holography. 


kira Tonomura changed the field 
Ae fundamental physics through 

microscopy. Like botanist Robert 
Brown before him, he opened up a new 
world to observation. In the nineteenth 
century, Brown's microscope revealed 
Brownian motion and the cell nucleus. 
In the twentieth and twenty-first, 
Tonomura’s has shown us basic princi- 
ples of the quantum landscape and its 
applications. 

Over decades, Tonomura developed 
the extremely stable, phase-coherent 
electron beams needed for good holo- 
graphic imaging, a technique that ena- 
bles measurement of both the intensity 
and the phase of transmitted electrons. 
This allowed many ‘thought experi- 
ments’ of quantum mechanics to be 
done in practice, and revealed details 
of magnetic and electric fields at the 
nanoscale. Tonomura used electron 
holography to illuminate the wave-par- 
ticle duality of electrons and to measure 
the magnetic fields of superconductors 
and other quantum effects under chal- 
lenging conditions, earning him heroic 
status among electron microscopists. 
His achievements meant he was tipped sev- 
eral times for a Nobel prize. He died of pan- 
creatic cancer in early May, at the age of 70. 

Tonomura spent some of his early child- 
hood in Hiroshima, Japan. Fortunately, 
his family moved away from the city two 
months before the fateful morning in early 
August 1945 when the nuclear bomb was 
dropped. Soon after graduating in phys- 
ics from the University of Tokyo in 1965, 
Tonomura joined the central research labo- 
ratory of the Hitachi Corporation in Tokyo 
and, with crucial encouragement from the 
distinguished electron microscopist Hiroshi 
Watanabe, began his long career. 

At that time, physicist Dennis Gabor, 
working at the engineering company British 
Thomson-Houston, based in London, had 
already proposed the idea of using holog- 
raphy to increase the resolution of electron 
microscopes, and the technique had passed 
some feasibility tests. The most immediate 
and spectacular applications of holography 
were in the field of light optics, in which it 
was used to create stunning three-dimen- 
sional images. This led to a Nobel prize for 
Gabor. Electron holography could advance 
no further until the invention of the electron 
biprism (a positively charged filament that 


324 | NATURE | VOL 486 | 21 JUNE 2012 


causes the electron paths on either side of 
it to cross) by Gottfried Méllenstedt at the 
University of Tibingen in Germany, where 
Tonomura worked briefly in 1973-4. Using 
this technology, Tonomura created the first 


practical electron holography microscope 
for Hitachi in 1978. 

Tonomura’s microscope proved crucial in 
settling a controversy over a bizarre quan- 
tum phenomenon. The Aharonov-Bohm 
(A-B) effect states that the phase of an 
electron’s wavefunction can be shifted by a 
nearby magnetic field, even if the electron 
doesn't pass through that field. This idea sits 
uneasily with the classical theories that were 
used to develop practical electron optics. 
Early experiments hinted at confirmation 
of the A-B effect, but critics argued that 
stray fields might have caused the observed 
phase shift. So Tonomura placed a ring 
magnet inside a superconducting sheath 
to eliminate any stray magnetic fields, and 
encased that within a copper layer to stop 
a passing electron beam from entering the 
magnetic field region. He then used elec- 
tron holography to confirm the predicted 
phase shift between electron paths inside 
and outside the ring. This conclusive and 
elegant experiment of 1986 finally silenced 
the critics, and was immediately recognized 
beyond the world of electron microscopy as 
a remarkable tour de force. 

Few electron holography microscopes 
outpaced the resolution of ordinary electron 
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microscopes as Gabor had envisaged, but 
Tonomura’s use of holography to detect 
electron phases allowed him to pioneer and 
dominate the technique’s practical applica- 
tion. In 1989, he scored a major success by 
imaging a magnetic vortex emerging 
from a superconducting film, and built 
on that with observations of vortices in 
various metallic and ceramic supercon- 
ductors. Tonomura and others mapped 
magnetic fields in small particles, in 
magnetic tape and, most recently, in the 
skyrmion lattice — a periodic arrange- 
ment of magnetic vortices generated by 
a complex structure of electron spins. 
In quantum computing, observations 
of superconducting vortices are key 
to investigating the behaviour of these 
potential ‘qubits. 

Tonomura combined stubborn per- 
sistence and experimental skill with 
imagination and excellent communica- 
tion skills. He presented material with 
scrupulous care in publications and 
seminars. Ina memorable Royal Institu- 
tion lecture in 1994, he filled the strictly 
allotted one-hour time slot almost to the 
second. His images of magnetic phe- 
nomena were so striking that they were often 
on the cover of major journals. An Internet 
video of his version of the classic ‘double- 
slit experiment’ continues to demonstrate 
for many the central mystery of quantum 
mechanics (see go.nature.com/722hph). 
It shows how electrons travelling through 
a biprism arrive at a detector one by one, 
as particles, but over time build up a wave 
interference pattern. 

Tonomura’s stellar reputation and powers 
of persuasion helped to secure financial 
support from the Japanese government 
for his ambitious ideas. In 2010, he was 
awarded the largest grant for an individ- 
ual research project in the country’s his- 
tory (see Nature 464, 966-967; 2010). He 
became seriously ill a year later. In the spirit 
of Gabor’s original idea for boosting resolu- 
tion, the project aims to use high-voltage 
electron holography to create 3D images 
of electron wavefunctions. Its future now 
depends on securing a leader as inspiring 
as Akira Tonomura. = 


Archie Howie is a physicist and electron 
microscopist in the Cavendish Laboratory, 
University of Cambridge, UK. 

e-mail: ah30@cam.ac.uk 
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Uta SCIENCE FICTION 


21ST-CENTURY GIRL 


BY ADRIAN TCHAIKOVSKY 


cc o you dream of mammoths?” 

D a talk-show host once asked 

me. I knew not to give him my 

first choice of answer; that would have 
enlightened his audience over the 
course of three and a half hours 
— so long as they had the basic 
grounding in biosciences 
required to understand it. I 
also knew enough to avoid my 
second answer, which was that 
his question was unintelligent, 
and that the entire interview 
had taken time I could have spent 
better in the laboratory. 

What I actually said was: “No 
more than you do,’ and the sound 
that the audience made told me that 
they liked that. I had gained their sym- 
pathy somehow. I couldn't see precisely 
the mechanism by which this had happened, 
but I filed the memory away with all the oth- 
ers, my empirical evidence of the human 
condition by which I attempt to govern my 
social interaction with my fellow hominids. 

Another one I get is: “You must have 
had a difficult childhood.” That throws me 
because it isn’t a question, and so you can't 
really answer it. It’s a statement, to which the 
only response, if response is even required, 
is: “Yes.” Of course I must. Why say the 
obvious, except that interviews are all about 
them saying the obvious, and me replying 
with lies and simplifications because that is, 
Ihave learned, what they want to hear. 

When I was 15, my foster-parents took me 
aside for The Talk. The bulk of my difficult 
childhood was behind me, although I had 
yet to learn most of the coping strategies 
I now rely on. I was essentially friendless, 
more comfortable interacting online than 
off, an academic overachiever and unable to 
understand why that didn’t come with the 
positive social pay-off that I had been led 
to expect. I didn't like crowds or strangers 
much. My world was comfortable with only 
a few other people in it. 

No different, really, from hundreds of 
other children across the world. 

I had thought that I knew what The Talk 
was going to be. They had never said, but 
I knew I wasn’t their natural offspring, by 
deduction from first principles. I didn’t look 
like them. I was built differently, and Id spent 
ages looking at my face in the mirror, tracing 
the contours of nose and chin and forehead. 


434 | NATURE | VOL 486 | 21 JUNE 2012 


Achild out of time. 


Iwasa striking girl. People who know about 
me now say I'm ugly, but that’s a judgement 
influenced by their foreknowledge — they 
think I should be ugly, and so they recast my 

features in that unflattering light. 
Striking, is the word I prefer. 
Not even unique, if 


you take each feature on its own. Not resem- 
bling my foster-parents, though. 

I know, I told them. I’m adopted. They 
were unsure how to proceed. I could see that 
this was not, in fact, what The Talk was to be 
about. Perhaps I shouldn't have said anything, 
but that is something I still have difficulties 
with: knowing when to withhold knowledge. 
It seems so counter-intuitive to do so. 

We had The Talk, at last. | wonder how 
many other disaffected, unsociable children 
wait for just that revelation: you are some- 
thing special; there is a reason why you are not 
like them. The Talk was about adoption, ina 
way, about telling me who my parents were. 
My mother was stem-cell research and my 
father was gene sequencing. 

When I was 20, and had been accepted for 
my doctorate, I made the decision to go pub- 
lic. You will recall the media storm. Nobody 
knew quite what to do with me. The geneti- 

cists behind my genesis 


DNATURECOM = had done something 
Follow Futures on unethical, and yet at the 
Facebook at: same time their detrac- 
go.nature.com/mtoodm © tors wanted to study 
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me. There were legal battles, in which I was 
a determined participant. IfI was to be a test 
subject, stripped of human rights, then at the 
same time my reviled creators were guilty of 
nothing more than making a thing. Alter- 
natively, ifthey had broken the boundaries 
of professional ethics, then it could only be 
because I was a human being. 

In having me raised among their own 
kind, in showing that I was intellectually, 
and at least borderline socially, functional, 
they sealed their own professional fate and 
secured mine. They must have known. 

They have been forgiven, since, 
because genius is too valuable a qual- 
ity to waste. As for me... 

I did not go into my current dis- 
cipline purely because of my unique 
past. I became a geneticist because it is 
an area in which my cognitive strengths 
shine. My ability to find patterns in com- 
plex data, and to focus without distraction 
on the small details of my work, is as appo- 
site for the minutiae of modelling gene- 
sequencing outcomes as it would have been 
for the painstaking production of exacting 
stone tools. In fact, the further I progress in 
my profession, the more I meet people who 
are just like me, despite our different herit- 
ages — and the less our different heritages 
matter in any meaningful way. 

I will have sisters, soon, and brothers, as 
close to me as blood-kin. That project is 
proceeding specifically because I have been 
more than a success: I and my people have 
valuable intellectual traits that the world can 
use. Iam not working on that team, though. 
I have other genomes to sequence, other 
verdicts of history to reverse. 

And still people want to know, “What's it 
like, being you? What is it like to be brought 
back, to be taken from your proper time?” 
And I answer that this is my time, that lam 
a child of the twenty-first century. And if], 
Homo sapiens neoneanderthalensis, did not 
evolve to live in cities and use the Internet 
and make advances in the field of genetics, 
then neither did Homo sapiens sapiens, and 
we will both have to make do. 

And why would I need to dream of mam- 
moths when these days I can step out of my 
office and just watch them? = 


Adrian Tchaikovsky was born in 
Lincolnshire, studied psychology and 
zoology at Reading and now practises law 
in Leeds. More of his work can be found at 
www.shadowsoftheapt.com. 
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The Great Eruption of yn Carinae 


ARISING FROM A. Rest et al. Nature 482, 375-378 (2012). 


During the years 1838-1858, the very massive star n Carinae became 
the prototype supernova impostor: it released nearly as much light as a 
supernova explosion and shed an impressive amount of mass, but 
survived as a star’. In the standard interpretation, mass was driven 
outward by excess radiation pressure, persisting for several years. 
From a light-echo spectrum of that event, Rest et al.* conclude that 
“other physical mechanisms” are required to explain it, because the gas 
outflow appears cooler than theoretical expectations. Here we note that 
(1) theory predicted a substantially lower temperature than they 
quoted, and (2) their inferred observational value is quite uncertain. 
Therefore, analyses so far do not reveal any significant contradiction 
between the observed spectrum and most previous discussions of the 
Great Eruption and its physics. 

Rest et al. state that a temperature of 7,000 K was expected, and that 
5,000 K is observed. These refer to outflow zones that produced most 
of the emergent radiation. For the 7,000 K value those authors cite a 
1987 analysis by one of us’, but they quote only a remark in the text, 
not the actual calculated values. According to figure 1 in ref. 3, n 
Carinae’s Great Eruption should have had a characteristic radiation 
temperature in the range 5,400-6,500K, not 7,000K. (Here we 
assume mass loss exceeding one solar mass per year and luminosity 
exceeding 10’ solar luminosities’.) The mention of 7,000 K in ref. 3 
concerned less extravagant outbursts, and n Carinae was explicitly 
stated to differ from them. Moreover, to establish a conflict between 
observations and expectations, new calculations with modernized 
opacities would be needed. 

The approximately 5,000 K temperature ‘observed’ by Rest et al. is 
based on a derived classification for the light-echo spectrum, using 
automated cross-correlations with a set of normal supergiant stars. 
This technique may be suitable for mass-production normal spectra, 
but any non-routine object requires specific feature-by-feature com- 
parisons instead. One of the first principles of stellar classification is 
to separate luminosity from temperature criteria, but all the reference 
stars in this case were far less luminous than 1 Carinae’s eruption. 
(Luminosity correlates with surface gravity, which affects gas density 
and thereby the spectrum.) Furthermore, emission lines appear to be 
present and may contaminate an automated analysis; but without access 
to the spectrum we cannot verify this. Rest et al. used a temperature 
calibration from a 1984 reference’ taken from an even older publication 
in 1977°. Considerable work has been done since then, and for the 
highest luminosities, each spectral type has a substantial range of 
possible temperatures—for example, 5,100-6,200K for the G2-G5 


Rest et al. reply 


spectral types favoured in their paper**. The temperature range indi- 
cated by stellar classification thus overlaps the theoretical expectations. 

Moreover, 1 Carinae’s eruption was a large-scale mass outflow, not 
a static atmosphere with definable surface gravity. This distinction 
quantitatively alters the relation between absorption lines and the 
underlying continuum. The characteristic radiation temperature To 
in the 1987 theoretical description’ is therefore defined differently 
from a normal star’s ‘effective temperature’. If spectral types are 
assigned to outflows, there is no reason to expect their temperatures 
to coincide with the stellar-atmosphere calibration adopted by Rest et 
al.’. This is not a question of stellar wind versus explosion; dense 
winds, stellar eruptions, and opaque explosions are basically alike in 
their emergent radiation physics'’, and their density dependences 
p(r) differ in character from normal stellar atmospheres. In conclu- 
sion, as far as existing models allow anyone to say, the observed 
spectrum appears to be consistent with what one expects for a giant 
eruption with n Carinae’s parameters. 
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REPLYING TO K. Davidson & R. M. Humphreys Nature 486, http://dx.doi.org/10.1038/nature11166 (2012). 


In our Letter! reporting the light echoes of n Carinae we analysed the 
spectral characteristics of n Carinae during the Great Eruption of the 
mid-1800s, and found the line content to be similar to that of G 
supergiant stars. This we interpret as evidence that n Carinae’s 
Great Eruption was not a typical luminous blue variable (LBV) 
outburst, because spectra similar to those of F and A supergiant stars, 
earlier and hotter than G-type stars, are observed in LBV eruptions of 
all kinds, in agreement with theoretical predictions. Davidson & 
Humphreys? object that our spectral type and temperature estimate 


are not sufficiently robust, and that the spectral features are in agree- 
ment with their theoretical predictions. 

The issues raised by Davidson and Humphreys’ involve elementary 
considerations of stellar atmospheres and LBV spectra, as well as 
deeper ones of epistemology. We are well aware of these issues, but 
they are outside the scope of a Nature Letter. As they say, other things 
(such as chemical abundances) being equal, a spectrum is determined 
by the temperature and pressure. In stellar atmospheres, the latter is in 
turn determined by the gravity, or more fundamentally the mass and 
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radius. In a LBV outburst, the physical situation is entirely different 
and at present unknown; in the accompanying News & Views’ Soker 
and Kashi describe their favoured hypothesis of a role for episodic 
binary mass transfer in the Great Eruption, but they also say that an 
instability in the primary itself is an equally viable hypothesis. 
Basically, we do not know whether Great Eruptions are late evolu- 
tionary events in all hypermassive stars, or whether they occur only in 
binary systems. There is no definitive model for these events, whether 
primitive or modern, and hence any derived physical parameters are 
highly uncertain. The comparison with stellar spectral types provides 
only a description of the line content of the LBV spectrum. By the 
same token, the comment that the Great Eruption of n Carinae was 
more luminous than the comparison supergiant stars is irrelevant. 

The absolute temperature derived for an LBV outburst spectrum, 
whether observationally or theoretically, is virtually meaningless, 
because there are no reliable models for the physical structure that 
produced it. However, the relative spectral types and temperatures at 
different stages of these events, or among different LBV stars at similar 
stages, may be more meaningful and indeed are traditionally used by 
all LBV specialists for descriptive purposes. For example, during LBV 
outbursts the spectral type becomes later and the apparent temper- 
ature lower towards the visual-light maximum. The G spectral type at 
the Great Eruption peak, which we derived from detailed comparison 
of several spectral features with those of the supergiants, both visually 
and by cross-correlation, is unusually late and unprecedented for an 
LBV outburst. It may be related to the huge amplitude of this event or 
additional physical mechanisms in n Carinae’s Great Eruption. Our 
suggestion of an explosion and blast wave is motivated by the large 
ratio of kinetic to radiative energy in the event’, and by the direct 
observation of velocities up to several thousand kilometres per second 
in some of the ejecta’. 
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DRUG DISCOVERY 


Computer model predicts side effects 


Drug candidates are usually found to be unsafe only late in the drug discovery process. A method for predicting the many 
biological targets of a given molecule might allow drug safety to be considered much earlier. SEE ARTICLE P.361 


KYLE KOLAJA 


( ‘the safe, effective drugs is not for 
the faint hearted. Pharmaceutical 
companies screen up to millions of 

compounds to find the handful that are suit- 
able for clinical development. The process 
typically begins with analogues of biologically 
active compounds being made to find those 
that interact potently and selectively with the 
drug target of interest. Attention then shifts to 
other properties of the compounds that could 
affect their use as drugs, including their safety 
profile. Unfortunately, predicting compound 
toxicity — let alone adverse side effects in 
patients — in preclinical studies is an onerous 
task. But on page 361 of this issue, Lounkine 
et al.’ report a computational approach for 
predicting side effects that might help to 
streamline drug discovery*. 

Identifying the mechanisms behind 
adverse drug effects remains inherently dif- 
ficult, despite major investment in predictive 
tools by pharmaceutical and biotechnology 
companies (Fig. 1). The challenge is often 
magnified by the limited availability of many 
compounds, the need to identify toxicity risks 
in a short space of time and the difficulty in 
identifying the biological targets that medi- 
ate any observed toxicity”. Side effects can 
result from the intended action of a drug 
(such as the bleeding associated with medica- 
tions that prevent blood clots) or because of 
unintended ‘off-target’ activity (such as the 
increase in blood pressure caused by torce- 
trapib, a compound that was designed to raise 
levels of ‘good’ cholesterol in the blood’). Any 
method that can rapidly and reliably identify 
the off-target activities of acompound, and the 
associated commercial liabilities, early in the 
drug-discovery process would be a powerful, 
must-have tool for a toxicologist. 

Lounkine and colleagues use two steps to 
determine the biological targets for which a 
molecule may have affinity. The structural sim- 
ilarity of known ligand molecules for a given 
biological target is first calculated and the 
structural relationship of the group of ligands 
with a test compound is then assessed. This 
approach helps to reveal associations between 


*This article and the paper’ under discussion were 
published online on 10 June 2012. 
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Figure 1 | Assessing the risks. The adverse side effects of drugs are difficult to foresee. Lounkine and 
colleagues’ computational method' for predicting such effects opens up a fresh approach for safety 


assessment in drug discovery. 


compounds that would not be obvious from 
more conventional analyses of chemical struc- 
tures or the amino-acid sequences of receptors. 
This mathematical method for predicting ‘guilt 
by association’ doesn’t require any compound 
material to generate testable hypotheses, 
making it a neat way to rapidly interpret 
observations of toxicity. 

To test their computational approach, 
Lounkine et al. used it to estimate the bind- 
ing affinities of a comprehensive set of 656 
approved drugs for 73 biological targets. They 
identified 1,644 possible drug-target interac- 
tions, of which 403 were already recorded in 
ChEMBL, a publicly available database of bio- 
logically active compounds. However, because 
the authors had used this database as a training 
set for their model, these predictions were not 
really indicative of the model’s effectiveness, 
and so were not considered further. 

A further 348 of the remaining 1,241 predic- 
tions were found in other databases (which the 
authors hadn’t used as training sets), leaving 
893 predictions, 694 of which were then tested 
experimentally. The authors found that 151 of 
these predicted drug-target interactions were 
genuine. So, of the 1,241 predictions not in 
ChEMBL, 499 were true; 543 were false; and 
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199 remain to be tested. Many of the newly 
discovered drug—target interactions would not 
have been predicted using conventional com- 
putational methods that calculate the strength 
of drug-target binding interactions based on 
the structures of the ligand and of the target’s 
binding site. 

The authors built on their results by iden- 
tifying adverse side effects associated with a 
particular chemical structural fingerprint, 
again using a guilt-by-association mathe- 
matical approach. Their model predicted an 
impressive 247 new drug-target-side effect 
associations, shedding light onto previously 
unappreciated possible mechanisms of side 
effects. For example, it suggested that pre- 
nylamine (a drug that widens blood vessels) 
causes sedation by acting at the histamine H, 
receptor. And it predicted that the anti-allergy 
drug diphenhydramine causes tremors by 
acting not only at a known off-target recep- 
tor (the sodium channel SCN10A), but also at 
a dopamine transporter protein, SLC6A3, for 
which the drug was not previously known to 
bea ligand. 

Lounkine et al. went on to provide strong 
experimental evidence ofa previously unrec- 
ognized mechanism for the upper abdominal 


LAMB/ALAMY 


pain caused by chlorotrianisene, a drug used 
to treat symptoms of menopause and deficien- 
cies in ovary function. Their computational 
method predicted that chlorotrianisene has 
potent affinity towards the COX-1 enzyme; 
COX-1 is a target of non-steroidal anti- 
inflammatory drugs, a class of molecules 
that has long been associated with adverse 
gastrointestinal events’. The authors found 
that chlorotrianisene potently inhibits plate- 
let aggregation in samples of human blood, 
which is a widely accepted biomarker of the in 
vivo effects of COX-1 inhibition and connects 
chlorotrianisene’s adverse effect to a plausible 
mechanism. This example demonstrates how 
rapidly Lounkine and colleagues’ computa- 
tional predictions can glean information about 
the off-target toxicology of drugs. 

Although the authors’ computational 
approach is highly informative, it does have 
some deficiencies. Roughly half of its predic- 
tions of adverse drug activity are false, which 


MATERIALS CHEMISTRY 


could prompt unnecessary biological experi- 
mentation in drug-discovery programmes, 
or even stop a safe compound from being 
developed further as a drug. And, as with any 
test, Loukine and colleagues’ approach will 
overlook certain drug activities — although 
such false negatives are less disconcerting 
at the early stages of drug discovery because 
thorough testing of on- and off-target effects 
will subsequently be conducted as a matter of 
course. It is also worth noting that the authors’ 
assessment of their technique’s performance 
is based on a retrospective training set; pro- 
spective use will be required to fine-tune 
the model and to discover how well it truly 
performs. 

Nonetheless, a computational method for 
predicting which biological sites are likely to be 
targeted by compounds provides a new way to 
improve safety assessment in drug discovery. 
Because Lounkine and colleagues’ approach 
doesn't require compounds to be synthesized 


Carbon origami 


Areaction that folds up large aromatic molecules and fixes them into bowl shapes 
expands opportunities for making nanometre-scale objects from single sheets of 
carbon. Such objects have potential applications in electronics. 


JAY S. SIEGEL 


ingle layers of carbon atoms, known 

as graphene sheets, have been féted as 

the material of choice for future gen- 
erations of electronic devices. But carbon 
can form a broad range of networks besides 
the hexagonal ‘chicken-wire’ arrangement of 
atoms found in graphene. For example, net- 
works formed from combinations of rings 
of five, six or seven carbon atoms can result 
in bowl, tube and saddle morphologies‘ 
that have diverse physical and electronic 
properties of use in applications ranging 
from structural engineering to light-energy 
conversion. Although they may form 
spuriously in nature, their selective preparation 
is difficult, requiring a high level of expertise 
in the crafts of molecular design and chemical 
synthesis. Reporting in Angewandte Chemie, 
Amsharov et al.” describe a practical method 
for preparing bowl-shaped carbon structures 
(buckybowls) in high yields, offering the 
tantalizing prospect that other extended, 
non-planar networks of carbon atoms will be 
accessible in the future. 

The authors start from polynuclear aromatic 
compounds (Fig. 1), which can be thought of 
as small graphene fragments. Transformation 
of these fragments into buckybowls can be 
achieved by forming carbon-carbon (C-C) 


bonds exclusively across ‘cove’ regions in the 
molecules — regions in which the carbon 
skeleton of the molecule curves inwards like 
the coastline around a cove. These reactions 
result in the formation of five-membered rings 
that induce the curvature found in carbon 
bowls and tubes; however, selectively forming 
C-C bonds across cove regions is easier said 
than done. 

Amsharov et al. offer a solution to this prob- 
lem by heating a polynuclear aromatic com- 
pound that bears a fluorine atom in the cove 
region with an alumina catalyst (Al,O,). This 
is a remarkable approach because carbon- 
fluorine bonds are generally regarded as being 
very strong, and therefore unreactive. The 
authors observed that only fluorine atoms in 
cove regions take part in the transformation 
— when they performed their reactions on 
starting materials containing fluorine atoms 
at non-cove regions, these atoms remained 
unaltered in the product. 

The transformation occurs at temperatures 
substantially lower than those required by 
alternative methods for making carbon net- 
works, such as flash vacuum pyrolysis (FVP), 
in which the reactant is rapidly heated in a 
vacuum, or chemical vapour deposition’, in 
which starting materials react on the surface 
ofa substrate to make a thin film. What’s more, 
the method works well with solid substrates 
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or wet-lab experiments to be conducted, it will 
be useful very early in the discovery process 
to predict the potential for off-target binding 
of drug candidates, and so to prioritize which 
compounds should be made by medicinal 
chemists. The ability to associate side effects 
with predicted off-target activity means that 
chemists can be guided not only by the efficacy 
of compounds, but also by safety concerns, 
as soon as a drug-discovery programme is 
initiated. m 
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in the absence of a solvent, affording well- 
characterized products in good yields. This is 
handy, because many of the substrates that are 
likely to be used in the reaction are insoluble 
in most solvents. The authors also report that, 
in compounds that have several cove regions 
bearing a fluorine atom, C-C bonds can form 
at multiple regions. 

Amsharov and colleagues’ reaction is one of 
several transformations’ that create new C-C 
bonds between aromatic rings, starting from 
aryl halides — compounds in which an aro- 
matic ring is attached through a carbon atom 
to a halogen, typically chlorine, bromine or 
iodine. For example, FVP is known to effect 
intramolecular reactions of this sort by elimi- 
nating hydrogen halide molecules (HX, where 
X is a halogen) from aryl halides. These FVP 
processes involve highly reactive intermedi- 
ates, such as free radicals, carbenes or benzyne. 
Aryl halides can also react on certain metal 
surfaces to form ribbons or star-shaped junc- 
tions of graphene’; catalysis by the metal makes 
the reaction possible. 

Most such reactions of aryl halides work 
best with starting materials that bear weak 
carbon-halogen bonds: carbon-iodine bonds 
are generally the weakest, then carbon- 
bromine bonds, followed by carbon-chlorine 
bonds. Carbon-fluorine bonds are typically 
even stronger than carbon-chlorine bonds, 
and so Amsharov and colleagues’ use of these 
in C-C bond-formation reactions is unusual, 
with few other examples known”. 

The development of strategies to manu- 
facture graphene and related carbon-based 
materials that have designed shapes and sym- 
metries’ is one of the foremost challenges in 
materials chemistry. Similarly, the race to find 
an industrial method for producing graphene 
sheets of decimetre-to-metre dimensions is 
hotly contested. Recent developments, such 
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Figure 1 | Bent into shape. Amsharov et al.’ report a method for making buckybowls — nanometre- 
scale objects that can be thought of as a single layer of carbon atoms bent into a bowl shape. They used 

a reaction in which a carbon-carbon bond forms across the ‘cove’ regions (blue areas) of polynuclear 
aromatic compounds, such as the benzopicene molecule in this example. The reaction requires an 
alumina (A1,O,) catalyst and occurs only where fluorine atoms are present at cove regions. The fluorine 
atoms in the starting material, and the bonds formed during the reaction, are shown in red. The structure 
shown beneath the reaction is the X-ray crystal structure of the buckybowl product, indacenopicene. 


as kilogram-scale processes for preparing the 
archetypal buckybowl, corannulene””, will 
help by increasing the commercial availability 
of building blocks for carbon nanotubes and 
graphene fragments. 

To understand one way in which such frag- 
ments could be used, consider the example of 
a fullerene — a ‘buckyball’ of carbon atoms, 
in which the atoms form the vertices of a net- 
work of polygons that makes up a spherical 
surface. Imagine cutting the network open. 
Depending on where you cut, various two- 
dimensional projections can be obtained!” 
that can be folded back into the ball. If poly- 
nuclear aromatic compounds are available that 
correspond to such projections, then synthetic 
methods such as that of Amsharov et al. could 
be used to fold up the molecules and ‘glue’ the 
edges together. 

The synthesis of specific projections is 
therefore a fashionable strategy for prepar- 
ing designer fullerenes’. Compounds such 
as corannulene are good starting points for 
many such projections, so having a readily 
available supply opens up opportunities to 
make these materials". In the same way, the 
synthesis of molecules that can be thought 
of as cross-sections of carbon nanotubes has 
become something of an art form’*"®, with one 
idea being that these could be stacked up and 
joined together to make nanotubes. 

It is worth noting that, although many 
aspects of chemical synthesis may be thought 
of as having the character of engineering, 
chemical methods are not generally trans- 
ferable to every synthetic problem — unlike 
universal engineering tools, which can be 
used transferably to make almost anything. 
The optimal construction of molecular tar- 
gets often requires finding exactly the right 
tool for each job. Nevertheless, seemingly 
specialized synthetic methods do evolve into 
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everyday tools used by chemists around the 
world and, ultimately, into scalable processes 
for industrial chemical production. In this 
way, Amsharov and colleagues’ findings, and 
those of others aiming to make similar materi- 
als, are vital because they reveal new types of 
efficient chemical transformation that might 


some day form the basis of the manufacture 
of graphenic materials tailored to a variety of 
applications. = 
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The breast cancer 


landscape 


Whole-genome sequencing of breast cancers is exposing the scope of tumour 
diversity and helping to pinpoint avenues for precise diagnostics and targeted 
therapy. SEE ARTICLES P.346 & P.353, LETTERS P.395, P.400 & P-405 


JOE GRAY & BRIAN DRUKER 


he information-generating power 

of genome-analysis technologies is 

increasing at a rate that surpasses even 
the doubling in computer performance that 
is achieved every 18 months by the semicon- 
ductor industry’. Genome-analysis methods 
are now sufficiently powerful, fast and reliable 
that they are underpinning efforts to elucidate 
the molecular architecture of human cancers” 
and, in some cases, can be used in routine 
clinical practice*. Five papers” published in 
this issue present whole-genome analyses of 
a range of different breast cancers, providing 
a comprehensive picture of breast cancer’s 
genetic diversity and suggesting refined 
tumour-classification strategies and new lines 
of therapeutic attack. 
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Genome-analysis techniques allow 
detection of various genetic abnormalities, 
including DNA sequence changes, structural 
changes (such as translocations) that alter the 
order in which DNA segments occur in the 
genome, and copy-number variants, in which 
whole segments of DNA are deleted, dupli- 
cated or amplified. These techniques can also 
provide information about the transcriptome 
— the complement of RNA molecules that are 
transcribed from DNA — ata particular time 
or in a particular tissue. Furthermore, they 
can give an indication of epigenetic modifi- 
cations, which are chemical modifications 
to DNA and associated proteins that regulate 
gene transcription rates without changing the 
nucleotide sequence. 

Early transcriptome-wide studies of patients 
with breast cancer demonstrated that the 


tumours could be organized into subtypes on 
the basis of patterns of gene expression that 
differed in clinical outcome’’. This advance 
led to the development of clinical assays that 
are now used to identify patients at suffi- 
ciently low risk of cancer recurrence to warrant 
surveillance in lieu of chemotherapy’. 
Genome-wide measurements of DNA 
sequence, copy number, structure and gene- 
expression levels during the past decade have 
revealed remarkably diverse derangement 
in individual breast tumours, among dif- 
ferent tumours and during various stages of 
tumour development. These aberrations 
involve many genes, including several 
implicated in cancer’*”. 

Expanding on these analyses are genome- 
wide functional studies’®, which have begun 
to identify aberrant genes that contribute to 
the changes in physiological processes that 
occur in breast cancer, in the hope that these 
might yield to therapeutic attack. These 
studies were inspired by the development 
of therapies that effectively treat tumours 
in which the gene ERBB2 is amplified’”. In 
parallel, in vitro systems that model aspects 
of breast-cancer diversity are being used 
to identify molecular features that predict 
therapeutic responses'*””. These efforts are 
now being stimulated by the development of 
high-resolution and high-throughput tech- 
niques, including microarray analysis and 
massively parallel DNA and RNA sequencing, 
that provide detailed information about the 
nature of cancer-causing aberrations”. 

The five papers in this issue describe the 
application of these techniques to a range 
of breast cancers. Curtis et al.° (page 346) 
analysed copy number, sequence changes 
known as single nucleotide polymorphisms, 
and gene-transcription rates in approximately 
2,000 breast cancers encompassing all known 
types. They defined 45 regions of sequence 
amplification or deletion that deregulate genes 
that are likely to be involved in the pathophysi- 
ology of breast cancer. These regions are now 
sufficiently well defined for researchers to 
commence studies of the driver genes expected 
to lie therein. Curtis and colleagues also inte- 
grated their measurements of copy number 
and gene expression to define new breast- 
cancer subtypes that are associated with 
different patient outcomes, although these 
associations are not yet strong enough to be 
applied in the clinic. 

Triple-negative breast cancers are those in 
which the tumours express neither ERBB2 nor 
the genes that encode the receptors for oestro- 
gen or progesterone. Shah et al.° (page 395) 
assessed mutations, copy number and gene 
expression in triple-negative breast cancers 
and found that the frequencies of copy- 
number abnormalities and mutations vary 
markedly between and within the tumours, 
which indicates that mutations can arise at 
multiple stages of tumour progression. Their 


study also suggests that three genes, TP53, 
PIK3CA and PTEN, are involved in the early 
stages of breast-cancer development. Interest- 
ingly, only one-third of the low-prevalence 
mutated genes that the authors identified were 
transcribed into RNA, which suggests that they 
may be chance mutations unrelated to the 
cancer and/or that the mutations involved 
genes with tumour-suppressive activity. 
Stephens et al.’ (page 400) analysed somatic 
mutations and copy-number variants in 
100 breast cancers and discovered numerous 
aberrations, including nine new cancer genes 
that were mutated rarely but more frequently 
than would be expected by chance, indicat- 
ing that they have 
functional roles in 


_ fhe data the cancers in which 
le these five they arise. Many of 
manuscripts are these mutations are 
aremarkable predicted to lead to 
testament to the expression of 
the power truncated proteins, 
of genomic which implies that 


technologies — the proteins, in their 


normal forms, are 
likely to have tumour-suppressor roles. The 
authors also report that certain DNA-base 
substitutions are strongly associated with 
the age of the patient in tumours that do not 
overexpress the ostrogen receptor (oestro- 
gen-receptor-negative tumours), but not in 
oestrogen-receptor-positive (overexpressing) 
ones, suggesting an important difference in the 
dynamics of mutation accumulation in these 
two tumour types. 

Ellis and colleagues® (page 353) focused on 
oestrogen-receptor-positive breast cancers. 
They assessed the genomes of tumours from 
patients participating in pre-operative clinical 
evaluation of their response to a class of drugs 
called aromatase inhibitors. The research- 
ers showed that tumours that have a high 
frequency of cells expressing a protein, Ki67, 
which is associated with resistance to aromatase 
inhibitors, contained an elevated frequency of 
somatic mutations (those that arise during 
tumour progression, rather than being inher- 
ited) and genome-structure changes compared 
with tumours with a low frequency of Ki67- 
positive cells. This finding implicates genetic 
changes that lead to deregulation of DNA rep- 
lication and repair processes in this drug resist- 
ance. However, a single ‘smoking gun’ predictor 
of resistance did not emerge from this study. 

Finally, Banerji and colleagues’ (page 405) 
examined somatic mutations in diverse breast 
cancers and report a rate of non-silent muta- 
tions (those that change the amino-acid 
sequence of a protein) of around one muta- 
tion per million DNA base pairs. Their study 
also added three aberrations to the list of 
those implicated in breast cancer — muta- 
tions in the genes CBFB and RUNX1, and the 
MAGI3-AKT3 gene fusion. 

The data in these five papers are a remarkable 
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testament to the power of genomic technolo- 
gies to define the landscape of a complex dis- 
ease state, and they give us the most thorough 
view yet of the molecular underpinnings of 
breast cancers. However, much remains to 
be accomplished to move this information 
into routine clinical practice. These stud- 
ies, and those that preceded them, show that 
individual breast cancers typically carry a 
few consistent and functionally character- 
ized abnormalities, along with tens to thou- 
sands of other changes that are rare or unique 
to the individual tumour and about which 
little is known. We need to understand 
which of these genes contribute to tumorigen- 
esis or tumour progression and how this ensem- 
ble of aberrations collaborates, changes during 
tumour evolution and interacts with the tumour 
microenvironment — and thereby how it reg- 
ulates each tumour’s behaviour and response 
to therapy. Understanding such deregulated 
biology should facilitate the development 
of therapeutic approaches targeting specific 
cellular pathways, including combination 
therapies, which are likely to be needed to 
achieve more durable patient responses. 
However, therapeutic-strategy development 
will require target validation, tumour- genome 
analysis (especially of tumours that respond to 
treatment), and the organization and analysis 
of clinical metadata on a scale that has not 
yet been attempted. Although this is a daunt- 
ing prospect, the promise still remains that 
the precision with which breast cancers are 
diagnosed and managed will be improved by 
the identification of new drug targets and 
by the ability to assign cancers to molecular 
subtypes that are associated with effective 
treatments. This is being made possible only 
by the remarkable advances in measurement 
science and computational capability. m 
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NUCLEAR PHYSICS 


Symmetrical tin 


The tin isotope Sn is the heaviest ‘doubly magic nucleus’ that has an equal 
number of protons and neutrons. It is now finally starting to give up its secrets, 
thanks to the persistent efforts of nuclear physicists. SEE ARTICLE P.341 


DANIEL BAZIN 


n page 341 of this issue, Hinke et al.’ 
report how, after years of endeavour, 
they have achieved a significant leap 
forward in the study of the heaviest ‘symmetri- 
cal doubly magic’ nucleus — the tin isotope 
‘Sn. Composed of 50 protons and 50 neu- 
trons, this nucleus is drawing the attention of 
nuclear physicists around the globe because of 
its unique location in the nuclear landscape. 
Nuclei are complex and challenging 
quantum objects. In contrast to the structure 
of atoms, for which the fundamental inter- 
action between the electrons and the nucleus 
— the electromagnetic force — is known 
with great precision, the interaction between 
the constituents of a nucleus, the strong 
nuclear force, is not so well known. This is due 
in part to the composite nature of the nuclear 
constituents, or nucleons, and the nature of 
the fundamental forces that bind nucleons 
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together (quantum chromodynamics and the 
electroweak force). 

As far as we know, nuclei are the smallest 
objects that can be split up into their constitu- 
ents. They are therefore the smallest entities 
in which emergent properties — patterns 
that arise from complexity — can be stud- 
ied. Nuclear scientists study these emergent 
phenomena and are using them to decipher 
the nature of the nuclear force. Magic num- 
bers are numbers of protons or neutrons that 
form full shells in an atomic nucleus, and are 
perhaps the foremost emergent property of 
nuclei. Thought to have first been coined by 
the physicist Eugene Wigner, the term reflects 
the unexpected shell structure of nuclei, as 
opposed to the liquid-like behaviour expected 
for such densely packed and strongly inter- 
acting objects. In fact, the independent 
particle model used to describe atoms, in 
which electrons (the particles) are assumed to 
move independently of each other, also works 
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Figure 1 | Nuclear landscape. The chart shows the location of all nuclei as a function of their neutron 
number (N) and proton number (Z). Dashed lines represent magic numbers, which correspond to 

full shells of protons or neutrons. Doubly magic nuclei lie at the intersections of magic-number lines. 
Symmetrical nuclei have an equal number of protons and neutrons. Hinke et al.' have produced and 
studied '’Sn, the heaviest symmetrical doubly magic nucleus. Note that the calculation used to predict 
bound nuclei in this chart is limited to those with N less than 160. 
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remarkably well for nuclei. The model has been 
able to explain — at least for stable nuclei” — 
the observed sequence of magic numbers: 2, 8, 
20, 28, 50, 82 and 126, all of which, by virtue of 
their ‘magic’ nature, correspond to increased 
stability. 

In recent years, however, as physicists have 
expanded their reach across the nuclear land- 
scape, a different picture has emerged. The 
magic numbers observed in stable nuclei seem 
to be either vanishing or evolving, especially 
on the neutron-rich side of the nuclide chart*"', 
which plots proton number against neutron 
number (Fig. 1). 

Nuclei that have a magic number of neu- 
trons or protons are more tightly bound than 
their non-magic counterparts, and their intrin- 
sic simplicity makes them prime candidates for 
testing proposed models of nuclear structure. 
Particularly interesting are nuclei in which the 
number of both protons and neutrons reaches 
one of the magic numbers. These doubly magic 
nuclei have even greater binding energy than 
singly magic nuclei. 

One would expect that symmetrical dou- 
bly magic nuclei, which have an equal magic 
number of protons and neutrons, would follow 
the magic number sequence, and this is indeed 
the case for light nuclei: those of helium (*He), 
oxygen ('°O) and calcium (*°Ca). However, 
because of the repulsion between protons, the 
line of stable nuclei in the nuclide chart veers 
away from the symmetry line, as ever more 
neutrons are required to bind heavier nuclei 
(Fig. 1). As a result, the only two other nuclei 
that follow the magic sequence are the nickel 
nucleus “Ni and Sn. These nuclei are bound 
but unstable: they undergo B-decay, in which 
a positron (the antiparticle of an electron) is 
emitted to produce a daughter nucleus. 

Although **Ni is not too far from being 
a stable nucleus (**Ni is stable), '°°Sn is very 
close to the edge of nuclear stability, where the 
nuclear force between nucleons can no longer 
bind them into a nucleus. The nucleus '’Sn 
has 12 neutrons fewer than the lightest stable 
isotope of tin, ‘Sn. Therein lies the particular 
attraction of '°Sn: it is at the same time doubly 
magic and at the edge of the nuclear landscape. 
Many long-standing questions about this odd- 
ball are now beginning to be answered. For 
example, is it really doubly magic and simple 
in structure? How is the strength of its B-decay 
distributed across the energy levels of its daugh- 
ter nucleus, indium-100? Does it have isomeric 
(metastable) states? The study of its B-decay 
is particularly interesting because of the large 
energy gap between the ground state of '°’Sn 
and that of its daughter, a characteristic of nuclei 
that are close to the limits of nuclear binding. 

Unfortunately, what makes this nucleus 
attractive is also what makes it difficult to 
study. It is so far away from stable isotopes 
that it is extremely difficult to produce. 
Two types of nuclear reaction have typi- 
cally been used to attempt this feat. One, 


called fusion-evaporation’, is a bottom- 
up approach, in which two nuclei are fused 
together with minimum excitation energy, 
so as to minimize the subsequent loss of pro- 
tons or a-particles (“He nuclei). The other 
reaction, called projectile fragmentation’, is 
more brutal but at present more effective. In 
this approach, a high-energy projectile that 
is heavier than “°Sn — the xenon nucleus 
Xe, at 1 gigaelectronvolt per atomic mass 
unit in Hinke and colleagues’ experiment’ 
— is sheared by making it collide with a 
target, leaving a residue that is composed 
of 50 protons and 50 neutrons. The chance 
of ending up with the desired nucleus is 
greater in the former approach than in 
the latter, but because of the underlying 
high energy involved, the latter method is 
more efficient at finding the ‘needle in the 
haystack, and the experiment is actually 
feasible. To give a sense of the filtering (isolat- 
ing the desired nuclei from the multitude of 
other species produced by the nuclear reac- 
tion) needed to pull this off, consider that, 
out of the 1.2 x 10’° “Xe projectiles acceler- 
ated during Hinke and colleagues’ experi- 
ment at the GSI Helmholtz Centre for Heavy 
Ion Research in Darmstadt, Germany, only 
259 Sn nuclei were identified. 

The results of the authors’ experiment rep- 
resent a giant step compared with previous 
attempts to study '°°Sn. Not only have the 
experimenters greatly improved the precision 
of the half-life measurement of this isotope, 
but they have, for the first time, also deter- 
mined the end point of the energy spectrum 
of B-decay (the maximum energy of the emit- 
ted positrons) and have observed y-ray tran- 
sitions, which correspond to decays between 
states of the daughter nucleus. Their deduc- 
tions are stunning: '*°Sn seems to have the 
highest-known B-decay strength of all nuclei, 
and has been classified as a ‘superallowed 
Gamow-Teller decay’ (Gamow-Teller tran- 
sitions allow the spin to change by 0 or +1 
between the initial state of the parent and the 
final states of the daughter). Usually, this label 
is reserved for Fermi decays (transitions that 
occur between states of the same spin), because 
they typically have the largest strengths. 

As always happens with scientists, once they 
have been given a taste of a new delicacy, they 
crave more. Other laboratories have joined 
the race and are working to improve on the 
GSI Sn production rates. They include: the 
Radioactive Isotope Beam Factory in Wako, 
part of Japan’s RIKEN national network of labs, 
which has recently synthesized '°’Sn nuclei; 
SPIRAL2 at the heavy-ion accelerator GANIL 
in France; and the Facility for Rare Isotope 
Beams at Michigan State University. These 
facilities will produce this remarkable nucleus, 
as well as many others, in even larger quanti- 
ties. Deciphering the emergent properties of 
Sn, and of other nuclei located far from the 
stability line on the nuclide chart, should lead 


scientists towards a fuller understanding of the 
nuclear force. m 
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Early start for 
rocky planets 


The chemical composition of stars that host small planets seems to be more varied 
than that of large planets. This finding may push back the clock for the start of 
rocky planets and of life around stars other than the Sun. SEE LETTER P.375 


DEBRA FISCHER 


n page 375 of this issue, Buchhave et al.' 
() describe a spectroscopic analysis of the 

chemical composition of stars host- 
ing planet candidates that have been detected 
by NASAs Kepler mission. The authors have 
focused on 152 stars harbouring planet can- 
didates that are a similar size to Neptune or 
smaller. The key result of this work is that 
small planets — those with a radius up to four 
times that of Earth — form under a broader 
range of environmental conditions than do gas- 
giant planets. This is an important finding, as 
it shows that the formation of small planets is 
a less constrained process than is the forma- 
tion of large planets. It also implies that small, 
rocky planets formed at an earlier epoch in the 
Universe's history than gas giants*. 

To appreciate this result, flash back to only a 
few hundred million years after the Big Bang, 
when the first stars in the Universe had just 
formed; the Universe today is approximately 
14 billion years old. Those stars were composed. 
of only hydrogen and helium, because heavier 
elements did not yet exist. Metals — defined 
by astronomers as all elements that are heavier 
than helium — are manufactured by the fusion 
of lighter elements in the ‘pressure-cooker’ cores 
of stars. When stars explode at the end of their 
lifetimes, their repositories of metals are cast into 
space, where they seed molecular clouds to pro- 
duce new generations of stars that are enriched in 
heavy elements. Thus powered by the life cycle of 
stars, the chemical composition of the Universe 
has evolved from a simple mixture of hydro- 
gen and helium gases to its current state replete 
with all of the elements in the periodic table. 

The chemical evolution of the Universe has 


*This article and the paper’ under discussion were 
published online on 13 June 2012. 
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profound implications for the formation of 
planets. Planets are born in ‘protoplanetary’ 
disks around nascent stars; both the star and 
the protoplanetary disk inherit the chemical 
composition of the parent molecular cloud. A 
star that consists of only hydrogen and helium 
will have a disk with that same composition. 
This makes it unlikely that the first stars in the 
Universe formed with planetary companions 
or life; the elements from which planets and 
our bodies are made did not exist. Even if 
exotic gas-giant planets composed of hydro- 
gen and helium existed, they could not have 
harboured pools of liquid water or a veneer of 
prebiotic chemistry. Therefore, planets that 
could have acted as platforms for biology did 

not populate the early Universe. 
Fast-forward 13.6 billion years to the 
present time and the situation is markedly 
different. Ground- 
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esting correlation 
was immediately noted: a large fraction of 
gas-giant planets was being detected around 
metal-rich stars**. Conversely, gas-giant 
planets were rarely detected around stars that 
had metallicities (metal abundances) lower than 
that of the Sun’. The general interpretation of 
this planet-metallicity correlation was that the 
presence of heavy elements is an environmental 
condition that increases the density of the pro- 
toplanetary disk and therefore the efficiency of 
planet formation. In the past few years, however, 
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ground-based surveys have also detected 
super-Earths — planets that have masses from a 
few to ten times the mass of Earth. Interestingly, 
the host stars of these low-mass planets do not 
seem to exhibit the same high metal content as 
do hosts of gas-giant planets. 

NASAs Kepler mission has revolutionized 
exoplanet science by virtue of its vast number 
of planet detections — more than 2,000 planet 
candidates to date. Buchhave and colleagues’ 
analysis is the first comprehensive assessment 
of host-star metallicity for the super-Earth- and 
Neptune-sized planets detected by Kepler, and 
it elevates the trend inferred by ground-based 
surveys’ to a solid statistical result: small planets 
can form around stars that have a lower metal 
content than do the host stars of gas-giant 
planets. The authors obtained the element com- 
position of Kepler stars by matching observed 
spectra to reference spectra. 

This result is exciting because it illuminates 
the history of the environmental requirements 
for planet formation. In the beginning, stars 
formed in isolation, without planets or life. As 
the chemical evolution of the Universe pro- 
ceeded, protoplanetary disks developed with a 
sufficient inventory of heavy elements to build 
the cores of planets, whether rocky or gaseous. 
However, for a gas- giant planet to form, there 
is a race against time: the core needs to reacha 
critical mass before the gas-rich material dissi- 
pates from the disk, which occurs within about 
5 million years”. If the core reaches the critical 
mass before the gas is lost, runaway gas accre- 
tion occurs. High metallicity helps the core to 
win this race. Small, rocky planets can accrete 
dust and rocky material long after the volatile 
gas has dissipated from the disk. They then 
acquire tenuous atmospheres by releasing light, 
volatile elements as gas or by accreting trace 
amounts of gas from the disk. This process can 
continue for 50 million to 100 million years”. 

Unfortunately, it is difficult to use knowl- 
edge of metallicity to pinpoint the beginning 
of the ‘age of planet formation, because the 
chemical enrichment of the Universe is not 
a monotonic process — some regions inside 
galaxies may have developed high metal- 
licity rapidly whereas other regions are still 
metal-poor today. However, knowing that the 
formation of rocky planets can occur in envi- 
ronments of lower metallicity than those of gas 
giants implies that there could be some places 
in the Universe where rocky planets and life got 
an earlier start than did Earthlings. m 
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Bird flu in mammals 


An engineered influenza virus based on a haemagglutinin protein from H5N1 avian 
influenza, with just four mutations, can be transmitted between ferrets, emphasizing 
the potential for a human pandemic to emerge from birds. SEE LETTER P.420 


HUI-LING YEN & 
JOSEPH SRIYAL MALIK PEIRIS 


animal influenza viruses, yet the molecular 

changes required for an animal virus to be 
transmitted efficiently between humans are 
poorly understood. Highly pathogenic H5N1 
avian flu viruses have circulated in poultry 
for more than 16 years, only rarely resulting 
in human infections. But when people do 
catch H5N1 bird flu, their disease is of unu- 
sual severity, raising concerns that a human 
H5N1 pandemic might have a catastrophic 
impact on public health. However, an H5N1 
virus that can be efficiently transmitted from 
human to human has not yet emerged, lead- 
ing some researchers to question whether these 
viruses are inherently incapable of acquiring 
this capacity. On page 420 of this issue, Imai 
et al.’ demonstrate that H5N1 viruses do have 
the potential to cause a human pandemic. The 
authors identify mutations in the avian virus 
that permit viral transmission between ferrets 
by means of respiratory droplets — the best 
available model for influenza transmission in 
humans”. 

Imai et al. focused on the haemagglutinin 
(HA) protein of a highly pathogenic H5N1 
influenza virus, which is involved in bind- 
ing and fusion of the virus to the cells that it 
infects; the HA type is used in the viral nomen- 
clature, together with the other influenza sur- 
face glycoprotein involved, a neuraminidase 
(NA). The HA of H5N1 viruses preferentially 
binds sialic acids in receptors on the surface 
of avian cells (Siaa2,3), whereas cells of the 
human upper airway predominantly have 
another type of sialic acid (Siaa2,6), which is 
recognized by human influenza viruses. The 
researchers introduced random mutations into 
the globular head of the HA molecule, where 
the receptor-binding domain is located, and 
searched for mutated viruses that exhibited 
enhanced binding to Siaa2,6. Using reverse 
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genetics, a technique that allows genetic 
manipulation of the virus genome, they then 
made a ‘hybrid’ H5N1 virus in which the gene 
encoding one of these mutated H5 HA proteins 
replaced the HA gene in the HIN] virus that 
caused a human pandemic in 2009. 

The researchers infected ferrets with this 
hybrid H5N1 virus and, following multiple 
rounds of infection, isolation of the virus from 
the upper respiratory tract of the infected ani- 
mals, and reinfection in animals not previously 
exposed to the virus, they obtained a virus that 
can be transmitted efficiently between the ani- 
mals. Four key amino-acid changes in the HA 
are associated with ferret respiratory-droplet 
transmissibility in the authors’ virus (Fig. 1a). 
Three of the mutations (N158D, N224K and 
Q226L) contribute to Siaa2,6 specificity. The 
fourth (T3181) lowers the pH at which the pro- 
tein undergoes a structural change that allows 
it to release its genetic material into the cyto- 
plasm of an infected cell by fusion of the viral 
envelope with intracellular membranes. 

There have been many previous attempts to 
determine whether H5N1 can acquire trans- 
mission capacity in ferrets. Two studies”? 
assessed H5N1 and H3N2 hybrid viruses, and 
another study* introduced mutations into the 
H5N1 HA that are known to increase Siaa2,6 
binding in H2 and H3 haemagglutinin — but 
neither approach conferred transmissibility by 
respiratory droplets. One study achieved par- 
tial transmissibility by introducing three HA 
mutations (Q196R, Q226L and G228S) into 
an H5N1 virus and combining this with the 
NA protein of a human seasonal H3N2 virus 
(Fig. 1b). It is interesting that the two H5 HAs 
that demonstrate transmissibility among fer- 
rets’” (Fig. 1a,b) contain similar mutations: 
one at amino-acid residues 158-160, which 
removes an N-linked glycosylation site of 
the globular head‘®, and the other at residues 
221-228, which alters the structure of the loop 
of the receptor-binding domain’. 

Another research group previously produced 
a transmissible H9N2 hybrid virus’ using the 
HA and NA ofa low-pathogenic avian virus 


and other genes from a human seasonal H3N2 
virus. The HA of this hybrid contains a Q226L 
mutation, which confers human-like Siaa2,6 
binding’ (Fig. 1c). During the 10 rounds of 
infection that were needed for this H9N2 virus 
to acquire transmissibility, it accumulated two 
additional HA mutations, one located at resi- 
due 189 of HA1 (close to the receptor-binding 
domain) and the other at residue 192 of HA2 
(close to the membrane-fusion domain), as 
well as one NA mutation located in the trans- 
membrane domain. Together, these studies 
demonstrate that mutations that increase 
Siaa2,6 binding'””, as well as those that 
stabilize HA structure’”, seem to be function- 
alities required for the HA of viruses that are 
transmissible among mammals. 

Thus, Imai and colleagues’ generation of a 
respiratory-droplet-transmissible H5N 1 virus 
represents the culmination ofa sustained effort 
by numerous groups to better understand the 
mechanisms that can confer mammalian 
transmissibility in avian influenza viruses. 
However, it is likely that different combinations 
of HA mutations can achieve the same effects, 
and further studies are needed to explore this 
possibility. Furthermore, although transmis- 
sion of influenza A virus among humans is 
already known*” to involve the virus HA, NA 
and basic polymerase protein 2, it is possible 
that other viral proteins, not explored by Imai 
and colleagues, also contribute to mammalian 
transmissibility. 

It is intriguing that, although the parent 
H5N1 virus strain (A/Vietnam/1203/2004) 
used by Imai et al. causes lethal disease in 
ferrets when they are directly infected, infec- 
tion with the ferret-transmissible H5N1 virus 
does not kill the animals. It is possible that 
the change in receptor binding from Siaa2,3, 
which is present on alveolar epithelial cells in 
ferret and human lungs, to Siaa2,6, found on 
cells of the upper respiratory tract, may change 
the virus from one that causes alveolar infec- 
tion, likely to result in more severe disease, to 
one targeting the upper airways, which is asso- 
ciated with milder symptoms. However, Imai 
and colleagues did not find that the two viruses 
differ markedly in their targeting of the upper 
or lower respiratory regions. 

Imai and colleagues’ H1N1 virus with an 
H5 HA isa laboratory creation, but it should 
not be considered an experimental artefact. 
Natural emergence ofan H5N1-HI1N1 hybrid 
virus is plausible. Some H1N1 and H5N1 
viruses readily swap genes with one another 
in vitro", generating hybrid viruses. Further- 
more, pandemic H1N1 viruses are established 
in pigs in many parts of the world, and H5N1 
viruses have been isolated from pigs’, suggest- 
ing that opportunities exist for the viruses to 
combine in these animals. 

However, these findings do not only pro- 
vide further indication that such a virus may 
arise naturally; they also pave the way for 
improved influenza surveillance and pandemic 
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A/Vietnam/1203/2004 
(H5N1) 


A/Egret/Egypt/1162/2006 
(H5N1) 


A/Ferret/Maryland/P10-UMD/2008 
(HON2) 


Figure 1 | Avian haemagglutinins transmissible in mammals. The haemagglutinin (HA) protein of 
influenza determines the type of target cell that the virus can infect. By mutating the site of the HA that binds 
to sialic acids in receptors on target cells, which differ between avian and mammalian cells, and other protein 
regions that determine the pH at which virus-cell fusion can occur, researchers have generated viruses 

that have avian HA proteins that can be transmitted from mammal to mammal. a, Imai et al.’ identify 

four mutations (N158D, N224K, Q226L and T318I) in the HA ofa highly pathogenic avian influenza 

virus, H5N1, that allow a virus with this HA to be transmitted by respiratory droplets between ferrets. The 
receptor-binding site of the HA is shaded in yellow. b, Chen et al.’ also created a virus with an H5N1 HA that 
can be partially transmitted between ferrets, by introducing three mutations (Q196R, Q226L and G228S) 
into the H5 HA. The virus used as the basis for this hybrid already contained the N158D mutation that 

Imai and colleagues also identified in their mutated HA. ¢, Sorrell et al.’ used the HA protein from a low- 
pathogenic H9N2 avian virus to achieve similar transmissibility. The Q226L mutation was already present 
in this virus, and two additional mutations (T189A and H192R/HA2) were acquired during ferret infection 
studies. The H192R/HA2 mutation cannot be shown because the protein is cleaved at a site that lies in front 
of this amino-acid residue. Full virus names and their subtypes are given below each structure. Mutated 
amino-acid residues are indicated by red and blue spheres. 


preparedness. For example, one of the four 
mutations reported by Imai et al., N158D, 
results in a loss of N-linked glycosylation, and 
loss of glycosylation at this residue is increas- 
ingly seen among naturally occurring HSN1 
isolates. Although the other three mutations 
have only very rarely been seen in H5N1 isolates 
from the field, mutations in HA that change the 
virus from binding Siaa2,3 to binding both 
Siaa2,3 and Siaa2,6 have been reported in both 
birds and humans”. These findings reinforce 
the need for focusing even greater attention on 
H5N1 infections in humans and other mam- 
mals (including pigs). Influenza viruses exist 
as genetic variants termed quasi-species even 
within a single clinical specimen, but such 
genetic diversity may not be fully assessed by 
conventional sequencing methods. Investiga- 
tion of mammalian clinical specimens using 
new deep sequencing methods for these muta- 
tions, and for other mutations that confer simi- 
lar functionality, will allow us to evaluate the 
extent of H5N1 adaptation that is occurring in 
mammalian hosts. More broadly, understand- 
ing the mutations that confer mammalian trans- 
mission of avian influenza viruses will allow 
better risk assessment of the animal viruses 
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that represent a pandemic threat, and help to 
select virus strains against which pre-pandemic 
vaccines should be generated. m 
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Engineering H5N1 avian influenza viruses 
to study human adaptation 


David M. Morens!, Kanta Subbarao! & Jeffery K. Taubenberger! 


Two studies of H5N1 avian influenza viruses that had been genetically engineered to render them transmissible between 
ferrets have proved highly controversial. Divergent opinions exist about the importance of these studies of influenza 
transmission and about potential ‘dual use’ research implications. No consensus has developed yet about how to balance 
these concerns. After not recommending immediate full publication of earlier, less complete versions of the studies, the 
United States National Science Advisory Board for Biosecurity subsequently recommended full publication of more 
complete manuscripts; however, controversy about this and similar research remains. 


work for the public good have expressed divergent opinions 

about the importance and public safety implications of two 
papers, one recently published’ and one soon-to-be published’, describ- 
ing the production of ferret-transmissible H5N1 influenza viruses, and 
about related influenza transmission and pathogenesis research*’. 
Some have emphasized that understanding the underlying principles 
of influenza virus host adaptation and transmission can lead to better 
prevention and control of viruses that arise naturally, whereas others 
have drawn attention to ‘dual use’ implications—that is, bioterrorism— 
or to accidental release of potentially deadly viruses. The most com- 
monly mentioned public safety concerns relate to three assumptions: (1) 
H5NI1 viruses are currently highly lethal to humans but are poorly 
transmissible; (2) genetic manipulation of H5N1 viruses to increase 
transmissibility in mammals such as ferrets will create variant viruses 
that remain highly pathogenic and that become transmissible in 
humans; and (3) if accidentally or intentionally released, such a virus 
could precipitate a historically severe influenza pandemic. How do these 
assumptions hold up against scientific data? In this perspective, we 
address research evidence related to the epidemic/pandemic potential 
of genetically engineered H5NI1 viruses, and discuss limitations in 
understanding how influenza viruses become pathogenic, transmissible 
and potentially pandemic in humans. 


K nowledgeable observers operating within a legitimate frame- 


Background 


Influenza is among the leading global infectious causes of death, 
periodically causing pandemics that can kill millions of people. 
Countless influenza A viruses circulate globally in a reservoir that con- 
sists of hundreds of avian species. Rarely, one of these viruses undergoes 
changes that enable it to switch hosts to infect mammals, including 
humans, although it is not clear whether human transmission can result 
directly from adaptation of an avian influenza virus (this has not been 
documented to occur), or only indirectly via further adaptation of 
pre-existing human or mammalian-adapted viruses, the mechanism 
that has been associated with all known pandemic and seasonal viruses 
after 1918. The factors underlying all such emergences are poorly 
understood". In the past 80 years of influenza virology, three pandemics 
have resulted from reassortments of pre-existing human-adapted or 
mammalian-adapted viruses with one or more avian-influenza-derived 
genes, but no purely avian influenza virus has emerged to cause a pandemic 
or human outbreak, or has even become stably adapted to humans. 


However, because avian influenza viruses have adapted to other 
mammals, it is considered plausible that such an emergence could occur 
in humans. 

Among many other important research areas related to influenza, it is 
therefore critical to study the mechanisms by which influenza viruses 
emerge from birds to become adapted to mammals and ultimately 
humans, and to learn how the phenotypic properties of such evolving 
viruses may be associated with human transmission and disease. Among 
the many subtypes and strains of avian influenza A viruses that exist in 
nature, those that have at least occasionally infected mammals (for 
example, HSN1, H7N7 and H9N2) are of interest because they might 
theoretically be more likely than other influenza A viruses to adapt 
directly or indirectly to humans. Highly pathogenic avian influenza 
(HPAI) H5NI1 viruses have been of particular interest with respect to 
theoretical pandemic potential because they have been unusually patho- 
genic in domestic poultry and have infected and killed several hundred 
people over a 15-year period. 

In seeking to understand such influenza viruses, a research approach 
used widely in virology is to engineer specific genetic mutations into 
naturally occurring viruses, and then study the resulting viral pheno- 
typic properties in animals, including infectivity, cell tropism, viral rep- 
lication, pathogenicity and transmissibility. These types of experiments 
can potentially provide clues about whether and howa virus might adapt 
to humans, and what prevention and control options might be useful if 
that virus did emerge. Much H5N1 research of this type has already been 
published, including viral genetic engineering to evaluate properties 
such as pathogenicity and transmissibility in ferrets and other animal 
models. In the context of this published research literature, we comment 
on questions relevant to the two papers under discussion’”. 


HSN1 infectivity for humans 


The ongoing HPAI H5N1 enzootic continues to cause ‘spill-over’ human 
infections. World Health Organization (WHO) data indicate that since 
2003, HPAI H5N1 viruses have infected 603 people and killed 356 (ref. 9). 
Technically, the term ‘highly pathogenic’ refers only to the effects of 
certain H5 and H7 influenza viruses in poultry, not in humans or other 
mammals; most such viruses either cannot infect, or are relatively 
harmless in, humans. HPAI H5 and H7 phenotypes are both associated 
with mutations in the haemagglutinin (HA) gene that usually result from 
insertion of a sequence of codons encoding multiple basic amino acids at 
the location where the two linked protein domains comprising the 
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mature HA are cleaved during infection’®. This cleavage site change leads 
to disseminated viral replication in multiple organs of avian species, 
resulting in high mortality. However, in humans, who cannot be easily 
infected with most low pathogenicity or HPAI viruses, if they can be 
infected at all, efficient replication outside the respiratory tract generally 
does not occur. Therefore, despite the current unusual situation with 
H5N1 viruses in humans since 2003 (see below), neither H5N1 nor other 
HPAI viruses would necessarily replicate systemically or cause extreme 
pathogenicity should human adaptation occur. Although the basis of 
HPAI HS5N1 viral pathogenicity in severe and fatal human cases remains 
unknown, there is no evidence suggesting that it results from changes 
known to be associated with viral adaptation to gallinaceous poultry; in 
fact, no human-adapted or pandemic influenza virus contains genetic 
changes indicative of prior poultry adaptation. 

Solely on the basis of publicly available information about pathogenicity 
of intranasally inoculated H5N1 virus (the model for natural human and 
animal infection), the laboratory-derived H5N1 viruses produced in the 
two papers under discussion'’’~'* do not have enhanced pathogenicity in 
ferrets compared to the 2009 pandemic H1N1 virus, which is considered 
to be mildly pathogenic for humans'*"’. An apparent misconception has 
nevertheless arisen in recent public discussions of these studies, namely 
that the engineered, ferret-transmissible H5N1 viruses were extremely 
pathogenic in ferrets after intranasal inoculation or aerosol transmission. 
This notion seems to have resulted in part from misunderstandings about 
a technique—intratracheal inoculation—used in a separate sub-study 
reported in the manuscript by the Fouchier group'*"’, a method that is 
not directly relevant to viral transmissibility or natural pathogenesis. As 
documented since the 1940s'°, intratracheal inoculation of influenza 
viruses is not a ‘model’ for natural viral pathogenicity; influenza viruses 
that are otherwise considered to be of low pathogenicity often induce 
severe and even fatal disease in animals when administered by this route, 
including the 2009 pandemic H1N1 virus’*. The presentation of trans- 
missibility studies alongside high-dose intratracheal inoculation patho- 
genesis studies in the Fouchier manuscript seems to have suggested 
(incorrectly) to some that the engineered transmissible H5N1 virus is 
deadly after intranasal inoculation or aerosol transmission between ferrets 
and, by extension, might be both transmissible and deadly for humans, that 
is, a virus of deadly pandemic potential. No evidence has yet been presented 
to support this, although the possibility that additional unspecified genetic 
changes might do so cannot be excluded. 


Potential for human adaptation of H5N1 viruses 


It is questionable to what extent HPAI H5N1 is adapted to, or capable of 
adapting to, humans. It is not clear why one evolving lineage of avian 
HPAI H5NI viruses, out of a large and genetically divergent pool of H5 
and other avian influenza viruses that rarely infect humans (much less 
cause severe human disease), has recently infected hundreds or perhaps 
thousands of people. It may be that the human cases are a result of 
unusual high-dose exposures or rare individual genetic susceptibilities. 
Alternatively, H5N1 viruses may be beginning to do something no other 
HPAI virus has ever been documented to do—adapt directly to humans. 
And if H5N1 did adapt, could it cause a pandemic? 

No HPAI virus in the historic record has ever been efficiently trans- 
mitted between humans, let alone caused a pandemic. Even when avian 
influenza virus genes have been imported by reassortment into existing 
human influenza viruses, as happened for example in 1957 with H2N2 
influenza and in 1968 with H3N2 influenza, the sources seem to have 
been circulating low-pathogenicity avian viruses, not poultry-adapted 
viruses such as HPAI viruses'®. Conceivably, the considerable host- 
switching mutations associated with adaptation of wild bird viruses to 
gallinaceous poultry, or at least of wild viruses to HPAI poultry viruses, 
represent an evolutionary pathway divergent from those pathways 
associated with mammalian adaptation, seemingly presenting an addi- 
tional challenge for poultry-adapted influenza viruses to achieve effi- 
cient mammalian adaptation’’. After 15 years of high-density enzootic 
circulation in domestic poultry around the world, no human-adapted 
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H5N1 virus has emerged from a natural reservoir, suggesting the exist- 
ence of unknown biological barriers. 

Despite circulation of influenza A viruses of 16 HA subtypes in billions 
of birds over a very long time span, the four pandemics in the last century 
have been restricted to influenza viruses bearing HA subtypes H1, H2 or 
H3. Decades ago, many experts predicted that influenza pandemics could 
be explained by ‘recycling’ of a small number of HAs in new human 
generations; more recently, this belief has been expanded to posit that 
the other HA subtypes (including H5) are fundamentally incapable of 
adapting to humans, being selected against by biological constraints or 
unappreciated selection pressures'**°. Despite widespread influenza 
virus circulation and dynamic evolution at the human-animal interface, 
with many billions of quasispecies, mutations and gene constellations 
circulating, only four influenza pandemics have occurred in the last 
century, and in the three of those with a known viral origin the viruses 
resulted from reassortment of pre-existing human or swine viruses’, not 
by mutation or adaptation of existing avian viruses. 

This suggests that de novo emergence of a human pandemic influenza 
virus is an extremely rare event that is not easily achieved in nature’®, and 
presumably would not be easily achieved by engineering a small number 
of laboratory mutations. As some of the key engineered H5N1 mutations 
in the two studies occur spontaneously during normal laboratory 
passage”, or have been found singly or in combination in natural 
H5N1 and in other influenza viruses*”®, including strains from wild 
birds, it remains unclear whether or how the engineered viruses in 
question create or increase the risk of a pandemic. 


Engineering H5N1 phenotypic changes 

Serial passage of a virus in intact animals or in tissues derived from a 
particular species often results in enhanced species-specific virulence, 
which can be applied to establish an animal model with measurable 
morbidity and/or mortality outcomes useful for evaluating antiviral 
therapeutics, passive immunization and vaccines. Influenza viruses, 
SARS coronavirus and Ebola virus have all been passaged in mice to 
enhance virulence; the resulting host-adapted viruses have been studied 
biologically and used to evaluate strategies for control and prevention. 
However, adaptational mutations resulting from serial passage tend to 
be host-specific and may not produce the same outcomes in other species. 
For example, the classical swine influenza virus, A/swine/Iowa/1930 
(HIN1), is very pathogenic in ferrets and mice but poses no threat to 
humans”’. Another example is mouse-adapted Ebola virus, which is lethal 
for mice and guinea pigs but attenuated for nonhuman primates**”’. 
Ferrets are susceptible to a wide range of viruses including influenza 
viruses, SARS coronavirus, canine distemper virus and some parvoviruses, 
many of which do not infect humans or other mammals. A number of 
influenza viruses that replicate efficiently in ferrets*’** seem poorly able, 
or unable, to infect humans, even after experimental challenge**. Thus, 
pathogenicity and transmissibility of any influenza virus in ferrets cannot 
be used directly to predict what type and severity of disease the same virus 
might produce in humans and human populations. 


Predicting human transmissibility 


It is unclear whether genetic manipulation of an H5N1 virus to achieve 
transmissibility in a particular mammal such as a ferret can predict 
human transmissibility. Because natural history and viral challenge 
studies cannot always be performed in humans, they have been con- 
ducted in experimental animals including mice, guinea pigs, ferrets, 
non-human primates and various other mammals. Unfortunately, there 
is no perfect animal model capable of reproducing all of the important 
variables involved in human influenza infection, although each animal 
model may be useful in understanding some aspect of influenza biology. 
Unlike most other mammals, ferrets generally can be infected with many 
or most avian, mammalian and human influenza viruses without prior 
viral adaptation, and often transmit efficiently between them”, provid- 
ing useful general information about the viral genetic basis of pheno- 
typic properties such as infectivity, pathogenicity, transmissibility and 
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immune responses”, even though the findings cannot necessarily be 
directly applied to human infections****. Furthermore, in decades of 
research, using a large number of different avian and mammalian 
influenza viruses, severe or fatal disease has not often been observed 
in ferrets following intranasal inoculation or aerosol exposure. 

These useful traits of easy infectability and mildly symptomatic infec- 
tion have rendered the ferret a ‘permissive’ influenza model. Specifically, 
many naturally occurring influenza viruses that infect, and often transmit 
between, ferrets are not known to infect people or cause human 
disease?”*°**. Ferrets are thus an imperfect model for predicting 
human infectivity or transmissibility, let alone the high level of trans- 
missibility characteristic of pandemic spread. On the basis of public pre- 
sentations by the senior authors of the two studies in question, neither 
of the engineered H5N1 viruses was as efficiently transmissible in ferrets 
as the human-adapted 2009 pandemic H1N1 virus'*’*. Phenotypic 
properties such as replication, pathogenicity and transmissibility are likely 
to be polygenic traits driven by mutations that are independent and 
possibly competing'®"'. Transmissibility is a complex phenotype that 
probably requires cooperative changes in more than one gene segment, 
and these may differ greatly between different viruses that become trans- 
missible. Mutations that confer transmissibility in a ferret may be species- 
specific and irrelevant to other hosts’. There are probably multiple unique 
virus-specific pathways to transmissibility for particular viruses infecting 
particular hosts*’. For example, transmissibility of the 1918 pandemic 
HIN1 virus has been linked to changes in the genes encoding HA and 
PB2 proteins***’, whereas transmissibility of the 2009 pandemic H1N1 
virus, which has a closely related HA, has been linked to gene segments 
encoding the neuraminidase and matrix proteins”. 

Moreover, because determinants of viral pathogenicity may be lost 
upon adaptation to a new host, H5N1 viruses, whether or not trans- 
missible, do not always cause severe disease in ferrets or non-human 
primates***’. For these reasons viral phenotypes observed in animal 
models like the ferret may not predict what would happen in humans. 
Indeed, given that many influenza viruses that are non-pathogenic for 
humans easily transmit and may cause illness in ferrets, the ‘ferret 
model’ does not reliably predict human transmissibility or pathogenicity, 
although the model remains valuable for understanding the principles 
and dynamics of infection. 

In addition to data from experiments in mammals, it is noteworthy 
that of the many mammalian-adapted influenza viruses that infect and 
transmit explosively among pigs, among horses and among dogs, few 
infect humans and none are transmitted between them”. Although swine 
influenza viruses caused sporadic human infections before 2009”, and 
caused the 2009 H1NI1 pandemic”’, outbreaks associated with human 
influenza viruses are rare in pigs. It is even conceivable that H5N1 viruses 
have already become adapted to mammals without causing severe disease 
or onward transmission to humans. Evidence from China’s Qinghai 
Lake, for example, shows 13.4% H5NI1 seropositivity and 3.4% active 
infection in apparently healthy, live-caught rodents called pikas*’. 
Clearly, adaptation of an influenza virus to a specific mammalian 
host does not necessarily predict its infectiousness, pathogenicity or 
transmissibility in other mammals, even though valuable insights into 
mechanisms of disease, host responses, and prevention and treatment 
may be obtained by studying these particular animals. Such insights 
can provide valuable clues in designing countermeasures against deadly 
epidemics and pandemics. 


H5SNI case-fatality rate 


Belief that an H5N1 virus could produce a 59% pandemic case-fatality 
rate is the most frightening aspect of the current controversy surround- 
ing aerosol transmission of H5N1 virus in ferrets. In 500 years of obser- 
vation, no influenza pandemic is believed to have caused a case-fatality 
rate over about 2% (ref. 52); pandemic and seasonal circulation of H1, 
H2 and H3 influenza viruses over the past century have all produced 
lower overall mortality rates’. The widely cited 59% figure is not a 
mortality rate but a case-fatality rate computed from cumulative cases 
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reported to WHO since 2003”. (Case fatality refers to fatal cases divided 
by all fatal plus non-fatal cases combined.) By general consensus, the 
WHO figure probably overestimates actual mortality. Among other 
concerns common to epidemiological data, the WHO case definition™ 
features diagnostic severity criteria (evidence of an acute pneumonia on 
chest radiograph plus evidence of respiratory failure) that constitute a 
self-fulfilling prophecy for fatality; as with many illnesses studied 
epidemiologically, severe diseases are more likely to receive optimal 
diagnostic work-up (‘detection bias’); and most H5N1 cases have been 
reported from countries with limited resources for identifying milder cases, 
if they occurred. These factors together could combine to erroneously 
inflate case-fatality calculations by over-counting severe cases and 
under-counting mild cases*. 

However, potentially more important clues to actual H5N1 
pathogenicity and human case-fatality rates come from epidemiological 
studies, which taken together suggest to us that HSN1 may not be highly 
lethal except in people with rare susceptibilities. Forty-six published 
H5NI1 seroprevalence studies of various exposure categories (household 
contacts, healthcare workers, poultry workers, and so on) show generally 
low H5N1 seroprevalence (mean, 1.7% of 21,435 persons examined in all 
46 studies combined (a bibliography of these studies is available from the 
authors on request)). Given intense poultry and other exposures in many 
study areas, these low rates at first seem perplexing, especially when 
compared to the much higher seroprevalence rates in humans for other 
avian influenza viruses such as H9N2 (ref. 56). When such information is 
considered in light of statistically significant clustering of non-human- 
transmitted (that is, presumably avian-acquired) household cases in 
genetically related versus unrelated persons*””*, a reasonable explana- 
tion seems to us to be that H5N1 is so poorly adapted to humans that 
exposure does not normally lead to infection or even the development ofa 
detectable immune response*’*’, except in persons with specific but 
undefined genetic susceptibilities, many of whom become cases*®. 
There are few data on what the basis of such genetic susceptibilities 
may be, although recent evidence has linked severe human influenza 
to a minor IFITM3 allele®’, supporting the suspicion that genetic deter- 
minants of influenza infection and replication in humans do exist. 

A published meta-analysis of selected seroprevalence studies implies 
that the actual H5N1 case-fatality rate may be far below 1% (ref. 56), and 
thus probably far below the case-fatality rates for seasonal influenza. 
This has been disputed because it has been difficult to find mild cases, 
and because of the possibility that some low-level antibody titres 
(<1:80) might be false positives’. On the other hand, rapid disappear- 
ance of human H5N1 vaccine-induced antibody™ suggests that the 
opposite problem of false negatives could be occurring and, if so, might 
be especially problematic in cross-sectional studies in which the time 
since infection is not known, and which could in some cases be long 
enough for antibody titres to wane to sub-threshold or undetectable 
levels. 

Given such confusing information, there has been little agreement so 
far on the important question of asymptomatic and undetected H5N1 
infections. But whatever the case, unless healthy seropositive people 
detected in seroprevalence studies temporally and geographically asso- 
ciated with H5N1 cases are all falsely seropositive, their addition to 
exposure denominators greatly decreases case-fatality determinations. 
For example, the 1997 Hong Kong H5N1 outbreak case-fatality rate of 
33.3% (ref. 65) drops to around 3% with the addition of exposed 
seropositive persons detected in the related seroprevalence studies. 
Similar recalculations of other data would yield far lower rates, and 
wider seroprevalence studies would undoubtedly lower case-fatality 
rates even further. 

Thus, an explanation for the apparent case-fatality rate/seroprevalence 
paradox may not be purely one of missing cases. Like other poorly adapted 
viruses that rarely infect humans*, the H5N1 virus may simply be pro- 
ductively infecting too few of the people exposed to it, even in situations of 
widespread human contact, leaving minimal immunological evidence of 
exposure at the population level, while at the same time ‘finding’ and 
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infecting those occasional persons with unusual susceptibilities to it; that 
is, cases’. Even so, it should be remembered that limited spread ofa deadly 
H5N1 virus, or pandemic H5N1 spread associated with a far lower case- 
fatality rate, would still be of public health concern. 


The dangers of information release 


Owing to global concern over a possible H5N1 influenza pandemic, the 
pathogenicity, immunogenicity and transmissibility of naturally occurring 
and laboratory-derived H5N1 viruses have been examined extensively and 
safely using high-containment facilities and appropriate regulatory and 
safety oversight (see later). The two H5N1 studies under discussion’? build 
upon and are the logical extensions of dozens of similar published studies 
performed in the wake of the 1997 Hong Kong H5NI1 outbreak. This 
research includes another published study in which genetic engineering 
of the H5N1 virus was able to newly create transmissibility in ferrets”, 
a similar study in which increased ferret transmissibility was not 
documented”, and a study in which transmissibility was restored and 
arguably increased in guinea pigs®*. None of these publications, includ- 
ing the prior publication of engineered H5N1 transmissibility in ferrets, 
led to concern among scientists, federal agencies or the public. 

Such studies feature numerous pathogenicity-associated, and 
sometimes transmissibility-associated, mutations affecting the HA- 
receptor-binding site, including changes that enhance receptor affinity 
for «2-6-linked sialic acid receptors, thought to be important for human 
adaptation®”’'. Other studies have examined mutations associated 
with changing antigenicity’”’’, changes associated with fusion”, 
changes associated with the polybasic HA cleavage site” ’’, and 
virulence factors in the polymerase proteins, crucial for viral 
replication™***, and in the non-structural protein (NS1), involved in 
antagonizing host type I interferon responses’**’. This widely available 
body of published research complicates determination of what to do 
with these two and with similar research manuscripts that seem likely to 
continue to appear. Withholding or redacting them does not prevent 
anyone from piecing together the basic information that they contain. 
Most of this information is generally known and relatively obvious, 
has already been published, and is now being widely publicized and 
discussed as a result of increased attention''!?"**!*?, 

Some would argue that even this background research should not 
have been done, or should henceforth be classified and made available 
only to ‘approved’ scientists who would be vetted by yet-to-be- 
determined mechanisms****. But had these former studies not been 
made available in the open literature, the field of influenza research 
would have been considerably impeded and our current state of 
knowledge and readiness for responding to future outbreaks and/or 
pandemics would be lessened. Some proposed that ‘censoring’ this 
information actually increases the risk of bioterrorism*’. The two studies 
under discussion’* can help augment surveillance to detect naturally 
emerging viruses with pandemic potential and expand our knowledge of 
the principles underlying host adaptation. Although the dangers of 
‘information release’ in the case of these two studies is probably small 
or nil—because all or most of the critical information is widely available 
anyway—it nevertheless remains important to rethink larger questions 
about balancing safety (accidental or deliberate release of an influenza 
virus or any dangerous pathogen) with the need to study such viruses to 
learn enough of their biology to prevent and control them. These are 
important issues that should be discussed broadly among scientists, 
policy makers and the public. 


Biosafety and biosecurity concerns 

As novel pathogens emerge, scientists must be able to continue to work 
with them safely and appropriately in teams using the talents of many 
highly trained researchers. Numerous layers of robust biosafety and bio- 
security protection and oversight are in place to safeguard the scientists 
and the public alike, including rigorous safety training, biocontainment 
practices, regulations and oversight, select agent rules, background investi- 
gations and biosurety oversight**. The H5N1 studies under discussion’? 
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were both performed in high containment laboratories with rigorous and 
appropriate oversight and biosecurity measures’*, as is the case for all 
such research in the US. 

Few disagree that it is crucial to continue research with H5N1 and 
other emerging infections, including investigation of how emerging 
pathogens adapt to new hosts and cause disease. However, it is import- 
ant to ask whether some types of infectious diseases research should not 
be done, or not published openly. If so, we need criteria to identify such 
research in advance, and processes to balance the importance of the 
research knowledge with the importance of preventing adverse con- 
sequences of the research*’. Even with the eventual publication of the 
two H5NI1 studies, questions about how such research should be 
approved, evaluated and made public remain unanswered*™’. The bio- 
medical field is built on more than a century of openness and full 
publication/broad discussion of all findings; it is unclear how redacted 
publications of future scientific data can be accomplished, and what 
effect such a system would have on science and scientific progress. 

These complex questions have been asked and answered in the past*’, 
and are being asked again in the context of these two papers. Continued 
discussion and decisions about how to deal with this research will be 
of importance to scientific progress and public health. We believe that 
it is important to consider the broader context of research aimed at 
understanding how influenza viruses adapt to humans. H5N1 is only 
one of many avian influenza viruses. If, as we believe existing data 
suggest, pathways to human adaptation are many, virus-specific, and 
with few common denominators, it will be important to study not just 
H5N1 but a wide range of avian and mammalian- and human-adapted 
viruses, including studies that feature backward genetic engineering to 
remove phenotypic determinants of adaptation, studies in nonhuman 
primates and, when safe and appropriate to do so, in human challenge 
studies™. 


Future directions 


The H5N1 controversy underscores how little is known about determinants 
of human influenza pathogenicity and transmissibility, which are among 
the most fundamentally important questions in infectious disease research 
because of the huge burden of influenza. 

In the past two decades the question of pursuing and publishing 
potential “dual use’ infectious disease research has always been decided 
in favour of conducting and publishing the research; for example, 
delineating the genomes of smallpox” and SARS viruses”, defining 
the pathogenicity of neuraminidase-inhibitor-resistant influenza 
viruses”'”’, genetically altering and making ferret transmissible both a 
more pathogenic pandemic H2N2 influenza virus”? and an H9N2 
avian influenza virus**, and resurrecting from RNA fragments, 
recreating and studying in vivo the 1918 pandemic influenza virus’*”®. 
In the latter case, important findings already have markedly enhanced 
our understanding of the emergence, transmissibility and pathogenicity 
of that important virus, helping us to prepare for and respond to the 
emergence of other influenza viruses. Examples include using the 1918 
HA crystal structure in vaccine design”, investigating the role of the host 
immune response in disease”*””, identification of mutations associated 
with pathogenicity and host adaptation*™’"”'"', understanding influenza 
evolution”’, and helping guide and target the response to the 2009 H1N1 
pandemic**'™. All of this work has been done safely with appropriate 
oversight, and without negative consequences. 

In considering the threat of bioterrorism or accidental release of 
genetically engineered viruses, it is worth remembering that nature is the 
ultimate bioterrorist. Indeed, HSN1 mutations, including some of those 
made in the two studies under discussion'”, occur spontaneously in nature, 
probably at a high rate, although they have not yet led to a pandemic. Given 
the relative rarity of pandemics caused by newly emerging influenza 
viruses, their explosive transmissibility may result from unique and 
virus-specific mutational changes that arise at very low frequency. For 
past pandemics, we have had limited ability to detect such changes by 
surveillance or by animal model experimentation. Thus, our best hope in 
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preventing and controlling the microbial agents that continually 
challenge us is to increase fundamental knowledge about the mechanisms 
by which they emerge, spread and cause disease, so that we can develop 
countermeasures such as enhanced surveillance, better diagnostics, 
vaccines and drug therapies. In moving forward we need to be safety 
conscious and to have consensus safety measures and policies in place, 
while at the same time using all available tools to seek broad understand- 
ing about the complex relationships between viruses and hosts. It is only 
this knowledge that stands between us and the devastation of future 
influenza pandemics. In reconsidering the proper balance between 
progress and safety, the critical importance of advancing scientific 
knowledge needs to be kept front and centre. 
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Superallowed Gamow- Teller decay of the 
doubly magic nucleus '°°Sn 
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H. Schaffner’, C. Scheidenberger’, S. Schwertel', P.-A. Sdderstrém!’, S. J. Steer®, A. Stolz?’ & P. Strmen!® 


The shell structure of atomic nuclei is associated with ‘magic numbers’ and originates in the nearly independent motion 
of neutrons and protons in a mean potential generated by all nucleons. During B*-decay, a proton transforms into a 
neutron in a previously not fully occupied orbital, emitting a positron—neutrino pair with either parallel or antiparallel 
spins, in a Gamow-Teller or Fermi transition, respectively. The transition probability, or strength, of a Gamow-Teller 
transition depends sensitively on the underlying shell structure and is usually distributed among many states in the 
neighbouring nucleus. Here we report measurements of the half-life and decay energy for the decay of '°°Sn, the 
heaviest doubly magic nucleus with equal numbers of protons and neutrons. In the B-decay of !°°Sn, a large fraction 
of the strength is observable because of the large decay energy. We determine the largest Gamow-Teller strength so far 
measured in allowed nuclear B-decay, establishing the ‘superallowed’ nature of this Gamow-Teller transition. The large 
strength and the low-energy states in the daughter nucleus, '°°In, are well reproduced by modern, large-scale shell 


model calculations. 


Gamow-Teller transitions, in which a proton is transformed into a 
neutron or vice versa, while possibly flipping its spin, represent an 
important spin-isospin degree of freedom in atomic nuclei. They are 
important in many astrophysical processes: they govern, for example, 
electron capture during the core collapse of supernovae. Furthermore, 
a detailed understanding of Gamow-Teller transitions will provide an 
essential constraint on the neutrino mass, in the event that neutrino- 
less double B-decay is ever observed. Most of the Gamow-Teller 
strength is found in the collective Gamow-Teller giant resonance 
(GTGR) of the neighbouring nucleus, which is typically a broad struc- 
ture composed of many states. Whereas in charge-exchange reactions 
in stable nuclei the full GTGR is accessible, the Gamow-Teller 
strength in unstable nuclei can, so far, only be studied through 
B-decay. However, B-decay studies can observe only the fraction of 
the total Gamow-Teller strength within the decay energy window. 
Towards more proton-rich nuclei, this window becomes larger. 
Nevertheless, it is still experimentally challenging to detect all small 
components of the Gamow-Teller strength’”. Thus, in most nuclei, 
measuring the full Gamow-Teller strength is difficult because it is 
fragmented and only partly accessible in B-decays. 

100s has N=50 neutrons and Z=50 protons, and as a result 
has completely occupied shells. It is therefore called ‘doubly magic’ 
and is particularly suited both experimentally and theoretically to the 
study of Gamow-Teller transitions. The closed N= Z=50 shells 


reduce the effect of long-range correlations, thus decreasing the 
amount of fragmentation of the GIGR. Theoretical predictions sug- 
gest that a single state is dominantly populated in this decay. At the 
same time, the energy window for B-decay is ~7.4 MeV (ref. 3), and 
most of the GTGR is therefore accessible. Such a situation for a doubly 
magic system is realized nowhere else in the Segré chart (a two- 
dimensional lattice in which all known nuclei are arranged with N 
and Z on the x and y axes, respectively): 160 and “Ca are stable nuclei; 
°°Ni has too small a Qgc value (the energy available for B* -decay or 
electron-capture decay) to make the Gamow-Teller resonance 
observable in B-decay; and doubly magic nuclei with N = Z that are 
heavier than 1°°Sn are unbound. Also, a recent experiment shows that 
°°Ni has a much more fragmented Gamow-Teller strength‘ as a result 
of aless robust N = Z = 28 doubly magic shell closure as well as subtle 
differences in the shell structure (Methods Summary and 
Supplementary Information). 

In an extreme, pure single-particle picture, the only possible 
Gamow-Teller transition of '’°Sn is the decay of a proton in the com- 
pletely filled go,z shell to a neutron in the empty g7/2 shell because the 
9/2 neutron orbital is filled and no levels above Z = 50 are occupied. 
The large energy separation (shell gap) between these spin-parallel 
(go/2) and spin-antiparallel (g7/) orbitals, for which the orbital angular 
momentum is L = 4, is responsible for 50 being a magic number. The 
B-decay of '°’Sn is supposed to be enhanced as a result of the large 


Physik Department E12, Technische Universitat Miinchen, D-85748 Garching, Germany. GSI Helmholtzzentrum fur Schwerionenforschung GmbH, D-64291 Darmstadt, Germany. “Istituto Nazionale di 
Fisica Nucleare, Laboratori Nazionali di Legnaro, 35020 Legnaro, Italy. “The Henryk Niewodniczanski Institute of Nuclear Physics (IFJ PAN), 31-342 Krakow, Poland. 5TRIUMF, Vancouver, British Columbia 
V6T 2A3, Canada. °School of Physics & Astronomy, The University of Edinburgh, Edinburgh EH9 3JZ, UK. “Université de Strasbourg, IPHC, 67037 Strasbourg Cedex, France. ®Department of Physics, 
University of Surrey, Guildford GU2 7XH, UK. 2Physics Department, Faculty of Science, Ankara University, 06100 Tandogan, Ankara, Turkey. Institute of Nuclear Physics, University of Cologne, D-50937 
Kéln, Germany. lM institute Vinca, University of Belgrade, 11000 Belgrade, Serbia. LIFIC, CSIC-University of Valencia, E-46071 Valencia, Spain. ISRIKEN Nishina Center, Wako, Saitama 351-0198, Japan. 
M4Grand Accélérateur National d’lons Lourds, CEA/DSM-CNRS/IN2P3, 14076 Caen, France. !°Comenius University, 818 06 Bratislava 16, Slovakia. 16Institute of Experimental Physics, University of 
Warsaw, PL-00681 Warsaw, Poland. ‘Institut ftir Kernphysik, Technische Universitat Darmstadt, D-64289 Darmstadt, Germany. !®Department of Physics & Astronomy, Uppsala University, SE-75120 
Uppsala, Sweden. !°Departamento de Fisica i Enginyeria Nuclear, Universitat Politecnica de Catalunya (EUETIB), E-08036 Barcelona, Spain. @°KVI, University of Groningen, 9747AA Groningen, The 


Netherlands. 2! National Superconducting Cyclotron Laboratory, Michigan State University, East Lansing, Michigan 48824-1321, USA. 


21 JUNE 2012 | VOL 486 | NATURE | 341 


©2012 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


number of protons occupying the go,2 shell, which can decay to the 
mostly empty neutron Sr shell. This would lead to a GTGR consisting 
of only a single I“ = 1” level (J, spin; 7, parity) with a large Gamow- 
Teller strength of about Bey = 10, taking into account the standard 
renormalization factor (0.75) of the Gamow-Teller matrix element 
due to configurations outside the model space’. This unique situation 
has been termed ‘superallowed’ Gamow-Teller decay®. Even in more 
realistic models, including particle-hole correlations, the Gamow- 
Teller decay of the ground state of '°°Sn is predicted to populate with 
more than 95% probability a single 1* state in In at an excitation 
energy of about 3 MeV. In these calculations, a Gamow-Teller strength 
of around 8-14 is obtained’””, leading to renormalized predictions of 
5-7 (refs 8, 10). These theoretical results are summarized in Methods 
Summary and Supplementary Information. 

The production of '°°Sn, and the study of its decay properties, has 
been the aim of several experiments''~’, but in these only a few '°°Sn 
nuclei were uniquely identified. Here we report a new measurement of 
the half-life and Qzc value from 259 identified '°°Sn nuclei, which 
yields the smallest log(ft) value of any known B-decay (here t is the 
half-life and fis a phase space factor that takes into account the trivial 
decay energy dependence of the half-life). The Gamow-Teller 
strength is inversely proportional to ft and thus is greatest for the 
10°Sn decay establishing the robustness of N = Z = 50 shell closures. 
The experimentally observed Gamow-Teller strength is well 
described in modern, large-scale shell model (LSSM) calculations, 
which are able to handle an unprecedentedly large degree of con- 
figuration mixing in the case of '’°Sn. This '°’Sn doubly magic shell 
closure is the benchmark for various topics currently discussed in this 
mass region, such as spin-aligned pairing in N=Z nuclei, alpha 
clustering and quadrupole collectivity in the Sn isotopic chain. 


Experimental details 


The experiment was performed at the GSI Helmholtzzentrum fiir 
Schwerionenforschung, Germany. A '**Xe beam with a kinetic energy 
of 1.0A GeV (A, nucleon number) and 1-s-long spills of 3 x 10° ions 
every 3s was directed onto a beryllium target placed in front of the 
fragment separator'’. Neutron-deficient nuclei were produced 
through relativistic projectile fragmentation, transmitted to the final 
focal plane of the fragment separator, and identified event by event 
(Fig. 1). The correct identification was verified by observing the 
y-radiation depopulating known isomers, for example the 8° isomer 
in °*Cd. In total, 259 '”°Sn nuclei were unambiguously identified. 
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Figure 1 | Particle identification plot. Events are plotted with respect to Zand 
the mass-to-charge ratio, A/Z, for the full statistics of the 100cn fragment 
separator setting. In total, 259 '°°Sn nuclei (those indicated in the figure) were 
unambiguously identified. Resolutions (full-widths at half-maximum) in mass 
of AA = 0.32 (A = N+ Z) and in nuclear charge of AZ = 0.25 were obtained. 
The colours indicate the number of events per bin in a logarithmic scale as 
indicated on the right-hand side. 
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This corresponds to a production rate of 0.75 per hour and a cross- 
section of 5.8 + 2.1 pb. All uncertainties correspond to one standard 
deviation. 

The ions were implanted into a stack of highly segmented silicon 
strip detectors surrounded by the RISING array, which consists of 105 
germanium detectors arranged in the stopped-beam configuration'® 
to detect y-rays with high efficiency. Of the 259 identified '°°Sn nuclei, 
163 were stopped in the 2.1-mm-thick implantation layer. 


Analysis and results 

Following a '°°Sn implantation in a pixel of the implantation zone of 
the silicon detector, all decay events were recorded that occurred 
within 15s in that pixel or in the directly neighbouring ones. 
During this correlation time, it was possible to assign 126 decay chains 
to the 163 '°°Sn implantations. A maximum-likelihood (MLH) 
analysis with a maximum of three decay events during the correlation 
time was used to analyse these decay chains. The half-life of *°°Sn was 
deduced to be 1.16 + 0.20s in the MLH analysis using established 
values for the lifetimes of the daughter nuclei. The measurement is 
much more precise than previous experiments yielding 0.941}: s 
(ref. 14) and 0.557 9:2! s (ref. 16). In Fig. 2, we show the decay curve 
for '°°Sn. 

Figure 3 shows the y-ray spectrum observed in coincidence with decay 
events following '°’Sn implantations. Notably, discrete -transitions 
from the '°°Sn decay could be observed. The five transitions denoted 
in Fig. 3 are associated with the depopulation of excited states in the 
daughter nucleus '°°In. 

The statistics were sufficient only to establish a coincidence between 
the 436-keV and 96-keV transitions, and it is thus impossible to deduce 
an unambiguous level scheme for 1001, Within the uncertainties, the 
transitions could have the same intensity, which would allow for a 
single cascade of five transitions from the excited 1* state to the ground 
state. However, this would lead to an excitation energy of more than 
4 MeV, which is higher than the value of about 2.5 MeV predicted with 
realistic shell model calculations (see, for example, refs 19, 20). The 
large uncertainties in the observed intensities also allow for two parallel 
cascades originating from this 1” state. 

Figure 4 shows the relevant level scheme for '°°In obtained from 
LSSM calculations. In this approach, 100Sh is not treated as an inert, 
doubly magic core but instead excitations across the N = Z = 50 shell 
closures were allowed within the fifth (44@) harmonic oscillator shell 
(Methods Summary and Supplementary Information). 

The states of two multiplets that are relevant for the decay are 
shown. The states originate in the coupling of proton (z) holes in 
the go/2 orbital either to neutron (v) particles in the g7/2. orbital 
(mg5/2 @vgi j2) with total spin and parity I” = 1*-8* or to neutron 
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Figure 2 | Time distribution of first decay events. The histogram shows the 
observed time distribution of all first decay events in the nearest-neighbouring 
pixels after implantation of '°°Sn nuclei. Decay curves resulting from the MLH 
analysis are shown individually for '°°Sn (dashed) and its daughter nucleus 
1007 (dash-dot). The solid line shows the sum of these decay curves and takes 
into account a small amount of random background. 
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Figure 3 | Spectrum of y-radiation. Energy distribution of the y-radiation 
observed within 4 s after implantation of !°’Sn. With 65% probability these are 
directly following the '°°Sn decay. The other contributions are uncorrelated 
background decays and daughter decays of °°In. None of the observed lines 
corresponds to known transitions from these minor contributions. The line at 
511 keV is due to positron annihilation radiation. The measured absolute 
numbers of transitions of the five lines with the energies 96, 141, 436, 1,297 and 
2,048 keV are respectively 79 + 40, 100 + 31, 59 + 22, 72 + 26 and 53 + 26 
corrected for electron conversion assuming M1 (magnetic dipole) transitions. 
Inset, enlarged view of the energy range up to 500 keV. 
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particles in the ds, orbital (m™g57 @vd; /2) With I” =2*-7*. The pre- 
dictions reflect the observed y-ray transitions well if the high-energy, 
2,048-keV, transition populates the lowest 2° state, this then decays to 
lower-lying states via the 436-, 141- and 96-keV transitions, and the 
decay chain ends either in the 6" ground state or a low-lying isomeric 
state with unobserved decay. In this picture, the second 2* state is 
populated by the 1,297-keV transition and decays to the lower-lying 
2* and 3* states. This may lead to a fragmentation of the intensities, 
making it impossible to observe these transitions in the present 
experiment. This picture is supported by three experimental facts: 
the measurement of the total y-ray energy (E], =2.76+0.43 MeV) 
in a previous experiment with a bismuth germanate detector’; the 
known mass difference between '°°Sn and '°°In!, combined with our 
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Figure 4 | Tentative level scheme of !°°In. a, LSSM calculation of the low- 
lying excited states in '°°In. Spin and parity (I*) are shown on the left and 
energy (keV) is shown on the right. Populated levels with an almost pure 
T¥5/2 @vgs /2 configuration are indicated with bold lines, and the remaining 
levels are part of the T$572 @vd} /2 multiplet. Gamma-transitions with their 
relative intensities (in per cent and indicated by arrow width) are shown for 
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measured f-decay end-point energy (Ej, =2.6+1.0 MeV); and our 
observation of a single event of f-delayed proton emission 
(Ej, =2.93+0.34 MeV). It is fully consistent with the expectation 
that a single 1* state is dominantly populated in the decay. Further 
details are given in Methods Summary and Supplementary 
Information. 

Asa key feature of this experiment, we measured the kinetic energy 
of the decay positrons fully absorbed in the compact silicon detector 
array. The spectrum resulting from the summed energies deposited by 
a B-particle in the pixels of the calorimeter up to 3s after a '°°Sn 
implantation is shown in Fig. 5. It was fitted using a MLH analysis 
based on a single-component -decay phase space function to deter- 
mine the end-point energy in the decay of '°°Sn. For the fit of the end- 
point energy, only data in the energy region between 400 and 
2,600 keV were used. In the analysis, corrections were applied to 
account for the emission of conversion electrons instead of low-energy 
y-rays during the de-excitation of the daughter nucleus '°°In, for 
bremsstrahlung emitted when the positrons are slowed down and for 
the annihilation of positrons in flight before the deposition of their 
total kinetic energy. The end-point energy of the B-decay, if populating 
a single final state in the daughter nucleus '°°In, was determined to be 
3.29 + 0.20 MeV. The corresponding fraction of electron-capture 
decays is 13% of all '°°Sn decays. 


Discussion 


Using the measured half-life and the end-point energy, we calculated 
a log(ft) value of 2.62+}:13, which is the smallest such value found so 
far for any nuclear B-decay. Thus, the Gamow-Teller decay of '°°Sn 
has a much larger strength than the known 0* 0° superallowed 
Fermi decays of N = Z nuclei and can indeed be considered a super- 
allowed Gamow-Teller decay”'. This finding is also illustrated in 
Fig. 6, which shows the distribution of log(ft) values for allowed 
Gamow-Teller and Fermi transitions. 

The extracted Gamow-Teller strength of the °°Sn ground-state 
decay to the single excited 1* state in In is Boy =9.113%. The 
measured value is extraordinarily large but consistent with the value 
of Ber = 5.8733 deduced from previous results for Qgc and half-life”, 
within the large error bars of the earlier measurement. The uncertainty 
in the new Ber value is dominated by the uncertainty in the B-decay 
end-point energy. The extraction of the strength was done under the 
assumption that the Gamow-Teller decay was into only one final 1° 
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selected transitions. b, Most likely level scheme for the five observed 
y-transitions in 107 (three with energies (keV) shown). Because one low- 
energy transition might have been missed, the energy of the levels might have a 
systematic shift of up to x = 80 keV. The dashed transitions and level z were not 
observed. The assignment of spin and parity is certain only for the 1” state; the 
others are tentative assignments based on theory. 
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Figure 5 | Distribution of the positron energies emitted in the B-decay of 
*°Sn. The spectrum contains only decay events that can be assigned to '°°Sn 
decays with a probability of at least 75%. The MLH fit was applied to the region 
between 400 and 2,600 keV, which is indicated with markers. The solid curve 
illustrates the shape of the best-fitting single-component {-decay phase space 
function determined by MLH analysis. 


state in !°°In. However, if 1* states at excitation energies above the 
observed state were also populated, the summed Gamow-Teller 
strength would increase while the strength of the decay into the first 
excited 1* state would decrease. It would have been difficult to observe 
such higher-energy states because the reduced phase space for lower- 
energy 8" -particles would have led to a strongly reduced population. 
The LSSM calculations, which within the gds harmonic oscillator shell 
take into account most of the long-range correlations across the 
N= Z=50 doubly magic shell closure and include up to five particle- 
hole excitations (Methods Summary and Supplementary Information), 
yield a total summed Gamow-Teller strength of Bgy = 8.19 for all pos- 
sible final states in the daughter nucleus up to 60 MeV. The standard 
renormalization due to correlations beyond the 0g, 1d, 2s model space 
has been applied’. The distribution of strength up to an excitation energy 
of 10 MeV is shown in Methods Summary and Supplementary Fig. 1. 
A Gamow-Teller strength of Bey = 7.82 is predicted in the experi- 
mental Qgc window of 7.03(20) MeV. This corresponds to a reduction 
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Figure 6 | Log(ft) values of allowed nuclear B-decays. Number distribution of 
log(ft) values for allowed f-transitions (obeying the selection rules). The data 
are from ref. 26. The values are for generally allowed Gamow-Teller transitions 
between 0* and 1” states (black), mixed Fermi/Gamow-Teller transitions 
(blue) and the well-established pure, superallowed Fermi transitions from 0* to 
0° states (green). The decay of !°°Sn is unique because it has the smallest known 
log(ft) value (red) of any nuclear B-decay. 
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in the total renormalized Gamow-Teller strength of the extreme 
single-particle estimate (Betzespm ~~ 10) by 18% for excitation 
energies up to 60 MeV and by 22% in the Qrc window. It is due to 
mixing in the gds harmonic oscillator shell, that is, emptying of the 
proton go, orbital, pre-filling of the neutron g-, orbital and destruct- 
ive interference of the four possible combinations of Gamow-Teller 
transitions within the g orbital (L = 4). The occupation numbers of 
the two orbitals that are linked by the Gamow-Teller operator, which 
acts on spin and isospin but does not change L, directly influence the 
strength of the transition matrix element. 

The calculation predicts that the largest fraction of the strength 
remains located in the first excited 17 state, in agreement with earlier 
calculations”*'. Nevertheless, according to our LSSM calculations it is 
reasonable to consider that several 1* states in '’In are populated in 
the decay of 100Sn. If, as an exercise, we take from the LSSM calcula- 
tion the four lowest 1” states in °°In with their energy splittings and 
relative Gamow-Teller strengths (Methods Summary and Sup- 
plementary Information), the value of Bey(1;")=9.173§ (assuming 
a single 1* state) would be reduced to Ber(1;')=7.6"3% for the 
first excited 1* state using the experimental half-life and B-spectrum. 
The corresponding summed Gamow-Teller strength would be 
i Ber(1; )=9.9435. Because this exercise served only to gauge 
the effect of branching on the experimental Bey value, no error for the 
branching ratios is included. 

The LSSM result of Boy = 5.7 for this first excited 1* state agrees 
within the statistical uncertainty with the value, Boy=7.613%, 
extracted from the experimental log(ft) value under the above 
assumptions. The experimental concentration of most of the Gamow- 
Teller strength in the first excited 1” state clearly classifies the '°’Sn 
Gamow-Teller decay as superallowed. This large experimental 
Gamow-Teller strength of the transition to the first excited 1* state 
proves that both the "”’Sn ground state and the first excited 1* state in 
‘Tn have relatively pure wavefunctions. As expected, the LSSM cal- 
culation reveals that the respective wavefunctions consist predomi- 
nantly of the Tg5/0B Bs /2, (82% probability) and Tg 2 OB; jn, (54% 
probability) components. The high purity of the wavefunctions within 
the gds model space establishes the simultaneous robustness of the 
Z=50 and N = 50 shell closures in !°°Sn, which is only ~3 MeV from 
the proton drip line, corroborating for N = 50 the results of refs 23, 24, 
and excludes the need for explicitly treating the unbound proton orbits 
as continuum states. 

The LSSM calculations allow enough configuration mixing in the 
gds shell that convergent results are obtained, leading to meaningful 
conclusions in this exotic region far from the valley of stability. This 
indicates that it should be possible to obtain reliable, more accurate 
results for nuclei in the neighbourhood of '°°Sn, especially close to the 
proton drip line. 

The underlying shell structures of nuclei in the vicinity of '°°Sn 
have to be determined with the highest possible accuracy to address 
the important issues in nuclear structure, such as the possibility of a 
new coupling scheme developing in the N = Z nuclei in the vicinity of 
100Sn (ref. 25). The present measurement is a stringent test for LSSM 
calculations, in which the realistic character of such a coupling 
scheme still needs to be probed. 

A better understanding of the nuclear structure is of major import- 
ance for modelling weak interaction rates in nuclei, which depend on 
the underlying shell structure and are important in many astrophysical 
processes. For example, Gamow-Teller transitions govern electron 
capture during the core collapse of supernovae. Also, Gamow-Teller 
transitions are an essential constraint on the theoretical calculations of 
neutrino-less double-B-decay matrix elements, the knowledge of 
which is necessary to relate the neutrino mass to the rate of this yet 
undiscovered process. Further interest in the decay rates of nuclei 
around '°°Sn comes from the study of certain astrophysical processes, 
as this region has been considered the end of the rapid proton capture 
process due to the Sn-Sb-Te cycle. 
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METHODS SUMMARY 


10°Sn and neighbouring nuclei were produced by fragmentation of a 1.0A-GeV 
"4Xe beam from the GSI accelerators, separated in the fragment separator and 
identified by multiple energy-loss, magnetic rigidity and time-of-flight measure- 
ments. The nuclei were stopped in an implantation detector with high spatial 
resolution to correlate implantations with succeeding decays. The device was 
surrounded by the stopped-beam RISING array of 15 X 7 germanium detectors 
in close geometry. In this configuration, the set-up enabled us to do nearly 41 
spectroscopy of the emitted y-radiation and particle-decay radiation. With a 
photopeak efficiency of about 10% (1 MeV) for y-ray detection and nearly 
100% for full energy detection of decay particles up to 5 MeV, this high-resolution 
set-up allowed for a maximum use of the secondary beam. 

The '°°Sn half-life and the B-decay end-point energy were calculated in the 
framework of a MLH analysis applied respectively to the time distribution of 
f-decays after implantation and the energy distribution of emitted positrons. This 
analysis also considered the daughter decays and the presence of uncorrelated 
random background decays from previous implantations. 

To interpret the measured Gamow-Teller strength and the observed y-rays 
emitted from '°°In, we carried out LSSM calculations. The valence space used in 
the LSSM consists of the fifth (44a) harmonic oscillator shell, that is, proton and 
neutron 7v(g,d,s) orbitals outside the 8075 core. The calculations included up to 
five particle-hole excitations from the go/2 proton and neutron orbitals to the rest 
of the shell, which made it possible for us to obtain convergent results for excita- 
tion spectra and the Gamow-Teller strength. 
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The genomic and transcriptomic 
architecture of 2,000 breast tumours 
reveals novel subgroups 


Christina Curtis)?+*, Sohrab P. Shah***, Suet-Feung Chin?*, Gulisa Turashvili®**, Oscar M. Rueda’?, Mark J. Dunning’, 
Doug Speed**+, Andy G. Lynch‘, Shamith Samarajiwa’, Yinyin Yuan!?, Stefan Grif'?, Gavin Ha®, Gholamreza Haffari’, 
Ali Bashashati’, Roslin Russell”, Steven McKinney*”, METABRIC Groupt, Anita Langerad®, Andrew Green’, Elena Provenzano®, 
Gordon Wishart®, Sarah Pinder’, Peter Watson®?*"°, Florian Markowetz'?, Leigh Murphy”®, Ian Ellis’, Arnie Purushotham?", 
Anne-Lise Borresen-Dale®’, James D. Brenton’, Simon Tavare>*?4, Carlos Caldash**" & Samuel Aparicio®* 


The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and 
transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene 
expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical 
follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy 
number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- 
and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer 
genes, including deletions in PPP2R2A, MTAPand MAP2K4. Unsupervised analysis of paired DNA-RNA profiles revealed 
novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, 
oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. 
Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR 
deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 
deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer 


population, derived from the impact of somatic CNAs on the transcriptome. 


Inherited genetic variation and acquired genomic aberrations contrib- 
ute to breast cancer initiation and progression. Although somatically 
acquired CNAs are the dominant feature of sporadic breast cancers, the 
driver events that are selected for during tumorigenesis are difficult to 
elucidate as they co-occur alongside a much larger landscape of random 
non-pathogenic passenger alterations'* and germline copy number 
variants (CNVs). Attempts to define subtypes of breast cancer and to 
discern possible somatic drivers are still in their relative infancy, in 
part because breast cancer represents multiple diseases, implying that 
large numbers (many hundreds or thousands) of patients must be 
studied. Here we describe an integrated genomic/transcriptomic 
analysis of breast cancers with long-term clinical outcomes composed 
of a discovery set of 997 primary tumours and a validation set of 995 
tumours from METABRIC (Molecular Taxonomy of Breast Cancer 
International Consortium). 


A breast cancer population genomic resource 


We assembled a collection of over 2,000 clinically annotated primary 
fresh-frozen breast cancer specimens from tumour banks in the UK 


and Canada (Supplementary Tables 1-3). Nearly all oestrogen receptor 
(ER)-positive and/or lymph node (LN)-negative patients did not receive 
chemotherapy, whereas ER-negative and LN-positive patients did. 
Additionally, none of the HER2 + patients received trastuzumab. As such, 
the treatments were homogeneous with respect to clinically relevant 
groupings. An initial set of 997 tumours was analysed as a discovery group 
and a further set of 995 tumours, for which complete data later became 
available, was used to test the reproducibility of the integrative clusters 
(described below). An overview of the main analytical approaches is 
provided in Supplementary Fig. 1. Details concerning expression and 
copy number profiling, including sample assignment to the PAM50 
intrinsic subtypes**’ (Supplementary Fig. 2), copy number analysis 
(Supplementary Tables 4-8) and validation (Supplementary Figs 3 and 
4 and Supplementary Tables 9-11), and TP53 mutational profiling 
(Supplementary Fig. 5) are described in the Supplementary Information. 


Genome variation affects tumour expression architecture 


Genomic variants are considered to act in cis when a variant at a locus 
has an impact on its own expression, or in trans when it is associated 
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with genes at other sites in the genome. We generated a map of CNAs, 
CNVs (Supplementary Fig. 6, Supplementary Tables 12-15) and 
single nucleotide polymorphisms (SNPs) in the breast cancer genome 
to distinguish germline from somatic variants (see Methods), and 
to examine the impact of each of these variants on the expression 
landscape. Previous studies* have shown that most heritable gene 
expression traits are governed by a combination of cis (proximal) loci, 
defined here as those within a 3-megabase (Mb) window surrounding 
the gene of interest, and trans (distal) loci, defined here as those 
outside that window. We assessed the relative influence of SNPs, 
CNVs and CNAs on tumour expression architecture, using each of 
these variants as a predictor (see Methods) to elucidate expression 
quantitative trait loci (eQTLs) among patients. 

Both germline variants and somatic aberrations were found to 
influence tumour expression architecture, having an impact on 
>39% (11,198/28,609) of expression probes genome-wide based on 
analysis of variance (ANOVA; see Methods), with roughly equal 
numbers of genes associated in cis and trans. CNAs were associated 
with the greatest number of expression profiles (Fig. 1, Supplementary 
Figs 7-13 and Supplementary Tables 16-20), but were rivalled by 
SNPs to explain a greater proportion of expression variation on a 
per-gene basis genome-wide, whereas the contribution from CNVs 
was more moderate (Fig. 1b and Supplementary Table 21). The true 
ratio of putative trans versus cis eQTLs is hard to estimate’; however, 
the large sample size used here allowed the detection of small effects, 
with 5,401 and 5,462 CNAs significantly (Siddk adjusted P value 
<0.0001) associated in cis or in trans, respectively. Whereas cis- 
associations tended to be stronger, the trans-acting loci modulated 
a larger number of messenger RNAs, as described below. 


Expression outliers refine the breast cancer landscape 

As shown above, ~20% of loci exhibit CNA-expression associations in 
cis (Supplementary Fig. 14). To refine this landscape further and identify 
the putative driver genes, we used profiles of outlying expression (see 
Methods and ref. 10) and the high resolution and sensitivity of the 
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Affymetrix SNP 6.0 platform to delineate candidate regions. This 
approach markedly reduces the complexity of the landscape to 45 regions 
(frequency > 5, Fig. 2) and narrows the focus, highlighting novel regions 
that modulate expression. The full enumeration of regions delineated by 
this approach and their subtype-specific associations (Supplementary 
Figs 15 and 16 and Supplementary Tables 22-24) includes both known 
drivers (for example, ZNF703 (ref. 11), PTEN (ref. 12), MYC, CCND1, 
MDM2, ERBB2, CCNE1 (ref. 13)) and putative driver aberrations (for 
example, MDM1, MDM4, CDK3, CDK4, CAMKI1D, PI4KB, NCORI1). 

The deletion landscape of breast cancer has been poorly explored, 
with the exception of PTEN. We illustrate three additional regions of 
significance centred on PPP2R2A (8p21, Fig. 2, region 11), MTAP 
(9p21, Fig. 2, region 15) and MAP2K4 (17p11, Fig. 2, region 33), 
which exhibit heterozygous and homozygous deletions (Supplemen- 
tary Figs 15, 17-19 and Supplementary Table 24) that drive expres- 
sion of these loci. We observe breast cancer subtype-specific (enriched 
in mitotic ER-positive cancers) loss of transcript expression in 
PPP2R2A, a B-regulatory subunit of the PP2A mitotic exit holoenzyme 
complex. Somatic mutations in PPP2R1A have recently been reported 
in clear cell ovarian cancers and endometrioid cancers!*'*, and 
methylation silencing of PPP2R2B has also been observed in colorectal 
cancers’®. Thus, dysregulation of specific PPP2R2A functions in luminal 
B breast cancers adds a significant pathophysiology to this subtype. 

MTAP (9p21, a component of methyladenosine salvage) is fre- 
quently co-deleted with the CDKN2A and CDKN2B tumour suppressor 
genes in a variety of cancers'” as we observe here (Supplementary Figs 
17c and 18). The third deletion encompasses MAP2K4 (also called 
MKK4) (17p11), a p38/Jun dual specificity serine/threonine protein 
kinase. MAP2K4 has been proposed as a recessive cancer gene’’, with 
mutations noted in cell lines!’. We show, for the first time, the recurrent 
deletion of MAP2K4 (Supplementary Figs 17d and 19) concomitant 
with outlying expression (Supplementary Fig. 15) in predominantly 
ER-positive cases, and verify homozygous deletions (Supplementary 
Table 9) in primary tumours, strengthening the evidence for MAP2K4 
as a tumour suppressor in breast cancer. 
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Figure 1 | Germline and somatic variants influence tumour expression 
architecture. a, Venn diagrams depict the relative contribution of SNPs, CNVs 
and CNAs to genome-wide, cis and trans tumour expression variation for 
significant expression associations (Siddk adjusted P-value <0.0001). 


b, Histograms illustrate the proportion of variance explained by the most 
significantly associated predictor for each predictor type, where several of the 
top associations are indicated. 
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Figure 2 | Patterns of cis outlying expression refine putative breast cancer 
drivers. A genome-wide view of outlying expression coincident with extreme 
copy number events in the CNA landscape highlights putative driver genes, as 
indicated by the arrows and numbered regions. The frequency (absolute count) 
of cases exhibiting an outlying expression profile at regions across the genome is 


Trans-acting associations reveal distinct modules 


We next asked how trans-associated expression profiles are distributed 
across the genome. We mapped these in the expression landscape by 
examining the matrices of CNA-expression associations (see Methods). 
This revealed strong off-diagonal patterns at loci on chromosomes 1q, 
7p, 8, 11q, 14q, 16, 17q and 20g (Fig. 3a), including both positive and 
negative associations, as well as numerous trans-acting aberra- 
tion hotspots (defined as CNAs associated with >30 mRNAs). 
Importantly, these aberration hotspots can be grouped into pathway 
modules, which highlight known driver loci such as ERBB2 and MYC, 
as well as novel loci associated with large trans expression modules 
(Supplementary Tables 25 and 26). The T-cell-receptor (TCR) loci on 
chromosomes 7 (TRG) and 14 (TRA) represent two such hotspots that 
modulated 381 and 153 unique mRNAs, respectively, as well as 19 
dually regulated genes (Supplementary Fig. 20). These cognate 
mRNAs were highly enriched for T-cell activation and proliferation, 
dendritic cell presentation, and leukocyte activation, which indicate 
the induction of an adaptive immune response associated with 
tumour-infiltrating lymphocytes (Fig. 3b, Supplementary Fig. 20 and 
Supplementary Tables 27 and 28), as described later. 

In a second approach, we examined the genome-wide patterns of 
linear correlation between copy number and expression features (see 
Methods), and noted the alignment of several off-diagonal signals, 
including those on chromosome 1q, 8q, 1lq, 14q and 16 (Sup- 
plementary Fig. 21). Additionally, a broad signal on chromosome 5 
localizing to a deletion event restricted to the basal-like tumours was 
observed (Supplementary Fig. 21), but was not detected with the 
eQTL framework, where discrete (as opposed to continuous) copy 
number values were used. This basal-specific trans module is enriched 
for transcriptional changes involving cell cycle, DNA damage repair 
and apoptosis (Supplementary Table 29), reflecting the high mitotic 
index typically associated with basal-like tumours, described in detail 
below. 
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shown, as is the distribution across subgroups for several regions in the insets. 
High-level amplifications are indicated in red and homozygous deletions in 
blue. Red asterisks above the bar plots indicate significantly different observed 
distributions than expected based on the overall population frequency (,’ test, 
P<0.0001). 


Integrative clustering reveals novel subgroups 


Using the discovery set of 997 breast cancers, we next asked whether 
novel biological subgroups could be found by joint clustering of copy 
number and gene expression data. On the basis of our finding that cis- 
acting CNAs dominated the expression landscape, the top 1,000 cis- 
associated genes across all subtypes (Supplementary Table 30) were 
used as features for input to a joint latent variable framework for 
integrative clustering”’ (see Methods). Cluster analysis suggested 10 
groups (based on Dunn’s index) (see Methods and Supplementary Figs 
22 and 23), but for completeness, this result was compared with the 
results for alternative numbers of clusters and clustering schemes (see 
Methods, Supplementary Figs 23-27 and Supplementary Tables 31- 
33). The 10 integrative clusters (labelled IntClust 1-10) were typified 
by well-defined copy number aberrations (Fig. 4, Supplementary Figs 
22, 28-30 and Supplementary Tables 34-39), and split many of the 
intrinsic subtypes (Supplementary Figs 31-33). Kaplan-Meier plots of 
disease-specific survival and Cox proportional hazards models indicate 
subgroups with distinct clinical outcomes (Fig. 5, Supplementary Figs 
34, 35 and Supplementary Tables 40 and 41). To validate these results, 
we trained a classifier (754 features) for the integrative subtypes in the 
discovery set using the nearest shrunken centroids approach”’ (see 
Methods and Supplementary Tables 42 and 43), and then classified 
the independent validation set of 995 cases into the 10 groups 
(Supplementary Table 44). The reproducibility of the clusters in the 
validation set is shown in three ways. First, classification of the valid- 
ation set resulted in the assignment of a similar proportion of cases to 
the 10 subgroups, each of which exhibited nearly identical copy number 
profiles (Fig. 4). Second, the groups have substantially similar hazard 
ratios (Fig. 5b, Supplementary Fig. 35 and Supplementary Table 40). 
Third, the quality of the clusters in the validation set is emphasized by 
the in-group proportions (IGP) measure™ (Fig. 4). 

Among the integrative clusters, we first note an ER-positive sub- 
group composed of 11q13/14 cis-acting luminal tumours (IntClust 2, 
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Figure 3 | Trans-acting aberration hotspots modulate concerted molecular 
pathways. a, Manhattan plot illustrating cis and trans expression-associated 
copy number aberrations from the eQTL analysis (top panel). The matrix of 
significant predictor-expression associations (adjusted P-value =0.0001) 
exhibits strong off-diagonal patterns (middle panel), and the frequency of 
mRNAs associated with a particular copy number aberration further illuminates 
these trans-acting aberration hotspots (bottom panel). The directionality of the 
associations is indicated as follows: cis: positive, red; negative, pink; trans: 
positive, blue; negative, green. b, Enrichment map of immune response modules 
in the trans-associated TRA network, where letters in parentheses represent the 
source database as follows: b, NCI-PID BioCarta; c, cancer cell map; k, KEGG; 
n, NCI-PID curated pathways; p, PANTHER; r, Reactome. 


n= 45) that harbour other common alterations. This subgroup 
exhibited a steep mortality trajectory with elevated hazard ratios 
(discovery set: 3.620, 95% confidence interval (1.905-6.878); valid- 
ation set: 3.353, 95% confidence interval (1.381-8.141)), indicating 
that it represents a particularly high-risk subgroup. Several known 
and putative driver genes reside in this region, namely CCND1 
(11q13.3), EMSY (11q13.5), PAKI (11q14.1) and RSFI (11q14.1), 
which have been previously linked to breast'*** or ovarian cancer™*. 
Both the copy number (Fig. 4) and expression outlier landscapes 
(Fig. 2) suggest at least two separate amplicons at 11q13/14, one at 
CCND1 (11q13.3) and a separate peak from 11q13.5-11q14.1 spanning 
UVRAG-GAB2, centred around PAKI, RSF1, Cllorf67 and INTS4, 
where it is more challenging to distinguish the driver**. Notably, the 
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expression outlier profiles for this region are enriched for samples 
belonging to IntClust 2 (Fig. 2, inset region 23) and all 45 members 
of this subgroup harboured amplifications of these genes, with high 
frequencies of amplification also observed for CCND1 (n = 39) and 
EMSY (n = 34). In light of these observations, the 11q13/14 amplicon 
may be driven by a cassette of genes rather than a single oncogene. 

Second, we note the existence of two subgroups marked by a paucity 
of copy number and cis-acting alterations. These subgroups cannot be 
explained by low cellularity tumours (see Methods). One subgroup 
(IntClust3, n = 156) with low genomic instability (Fig. 4 and Sup- 
plementary Fig. 22) was composed predominantly of luminal A cases, 
and was enriched for histotypes that typically have good prognosis, 
including invasive lobular and tubular carcinomas. The other sub- 
group (IntClust 4, n = 167) was also composed of favourable outcome 
cases, but included both ER-positive and ER-negative cases and varied 
intrinsic subtypes, and had an essentially flat copy number landscape, 
hence termed the ‘CNA-devoid’ subgroup. A significant proportion of 
cases within this subgroup exhibit extensive lymphocytic infiltration 
(Supplementary Table 45). 

Third, several intermediate prognosis groups of predominantly 
ER-positive cancers were identified, including a 17q23/20q cis-acting 
luminal B subgroup (IntClust 1, n = 76), an 8p12 cis-acting luminal 
subgroup (IntClust 6, n= 44), as well as an 8q cis-acting/20q- 
amplified mixed subgroup (IntClust 9, n = 67). Two luminal A sub- 
groups with similar CNA profiles and favourable outcome were 
noted. One subgroup is characterized by the classical 1q gain/16q loss 
(IntClust 8, n = 143), which corresponds to a common translocation 
event”, and the other lacks the 1q alteration, while maintaining the 
16p gain/16q loss with higher frequencies of 8q amplification 
(IntClust 7, n = 109). We also noted that the majority of basal-like 
tumours formed a stable, mostly high-genomic instability subgroup 
(IntClust 10, n = 96). This subgroup had relatively good long-term 
outcomes (after 5 years), consistent with ref. 26, and characteristic cis- 
acting alterations (5 loss/8q gain/10p gain/12p gain). 

The ERBB2-amplified cancers composed of HER2-enriched (ER- 
negative) cases and luminal (ER-positive) cases appear as IntClust 5 
(n = 94), thus refining the ERBB2 intrinsic subtype by grouping addi- 
tional patients that might benefit from targeted therapy. Patients in 
this study were enrolled before the general availability of trastuzumab, 
and as expected this subgroup exhibits the worst disease-specific sur- 
vival at both 5 and 15 years and elevated hazard ratios (discovery set: 
3.899, 95% confidence interval (2.234-6.804); validation set: 4.447, 
95% confidence interval (2.284-8.661)). 


Pathway deregulation in the integrative subgroups 


Finally, we projected the molecular profiles of the integrative sub- 
groups onto pathways to examine possible biological themes among 
breast cancer subgroups (Supplementary Tables 46 and 47) and the 
relative impact of cis and trans expression modules on the pathways. 
The CNA-devoid (IntClust 4) group exhibits a strong immune and 
inflammation signature involving the antigen presentation pathway, 
OX40 signalling, and cytotoxic T-lymphocyte-mediated apoptosis 
(Supplementary Fig. 36). Given that trans-acting deletion hotspots 
were localized to the TRG and TRA loci and were associated with 
an adaptive immune response module, we asked whether these dele- 
tions contribute to alterations in this pathway. The CNA-devoid sub- 
group (IntClust 4) was found to exhibit nearly twice as many deletions 
(typically heterozygous loss) at the TRG and TRA loci (~20% of cases) 
as compared to the other subtypes (with the exception of IntClust 10), 
and deletions of both TCR loci were significantly associated with 
severe lymphocytic infiltration ( ¢ test, P<10 ° and P<10 
respectively). Notably, these trans-associated mRNAs were signifi- 
cantly enriched in the immune response signature of the CNA-devoid 
subgroup (Supplementary Fig. 36) as well as among genes differentially 
expressed in CNA-devoid cases with severe lymphocytic infiltration 
(Supplementary Fig. 37). We conclude that genomic copy number loss 
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Figure 4 | The integrative subgroups have distinct copy number profiles. 
Genome-wide frequencies (F, proportion of cases) of somatic CNAs (y-axis, 
upper plot) and the subtype-specific association (-log;9 P-value) of aberrations 
(y-axis, bottom plot) based on a 7’ test of independence are shown for each of 
the 10 integrative clusters. Regions of copy number gain are indicated in red 
and regions of loss in blue in the frequency plot (upper plot). Subgroups were 


at the TCR loci drives a trans-acting immune response module that 
associates with lymphocytic infiltration, and characterizes an otherwise 
genomically quiescent subgroup of ER-positive and ER-negative 
patients with good prognosis. These observations suggest the presence 
of mature T lymphocytes (with rearranged TCR loci), which may 
explain an immunological response to the cancer. In line with these 
findings, a recent study”” demonstrated the association between CD8* 
lymphocytes and favourable prognosis. 

Also among the trans-influenced groups is IntClust 10 (basal-like 
cancer enriched subgroup), which harbours chromosome 5q dele- 
tions (Supplementary Fig. 21). Numerous signalling molecules, tran- 
scription factors and cell division genes were associated in trans with 
this deletion event in the basal cancers, including alterations in 
AURKB, BCL2, BUB1, CDCA3, CDCA4, CDC20, CDC45, CHEK1, 
FOXM1, HDAC2, IGFIR, KIF2C, KIFC1, MTHFDIL, RAD51AP1, 
TTK and UBE2C (Supplementary Fig. 38). Notably, TTK (MPS1), a 
dual specificity kinase that assists AURKB in chromosome alignment 
during mitosis, and recently reported to promote aneuploidy in breast 
cancer’*, was upregulated. These results indicate that 5q deletions 
modulate the coordinate transcriptional control of genomic and 
chromosomal instability and cell cycle regulation within this subgroup. 

In contrast to these subtype-specific trans-associated signatures, 
the high-risk 11q13/14 subgroup was characterized by strong 
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ordered by hierarchical clustering of their copy number profiles in the discovery 
cohort (1 = 997). For the validation cohort (n = 995), samples were classified 
into each of the integrative clusters as described in the text. The number of cases 
in each subgroup (n) is indicated as is the in-group proportion (IGP) and 
associated P-value, as well as the distribution of PAMS50 subtypes within each 
cluster. 


cis-acting associations. Like the basal cancers, this subgroup also 
exhibited alterations in key cell-cycle-related genes (Supplementary 
Fig. 39), which probably have a role in its aggressive pathophysiology, 
but the nature of the signature differs. In particular, the regulation of 
the G1/S transition by BTG family proteins, which include CCND1, 
PPP2R1B and E2F2, was significantly enriched in the 11q13/14 cis- 
acting subgroup, but not the basal cancers, and this is consistent with 
CCND1 and the PPP2R subunit representing subtype-specific drivers 
in these tumours. 


Discussion 

We have generated a robust, population-based molecular subgroup- 
ing of breast cancer based on multiple genomic views. The size and 
nature of this cohort made it amenable to eQTL analyses, which can 
aid the identification of loci that contribute to the disease phenotype”. 
CNAs and SNPs influenced expression variation, with CNAs 
dominating the landscape in cis and trans. The joint clustering of 
CNAs and gene expression profiles further resolves the considerable 
heterogeneity of the expression-only subgroups, and highlights a 
high-risk 11q13/14 cis-acting subgroup as well as several other strong 
cis-acting clusters and a genomically quiescent group. The reproducibility 
of subgroups with these molecular and clinical features in a validation 
cohort of 995 tumours suggests that by integrating multiple genomic 
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PP2A holoenzyme complex and MTAP, which have previously been 
under-explored in breast cancer. The CNA-expression landscape also 
illuminates rare but potentially significant events, including IGFIR, 
KRAS and EGFR amplifications and CDKN2B, BRCA2, RB1, ATM, 
SMAD4, NCORI and UTX homozygous deletions. Although some of 
these events have low overall frequencies (<1% patients) (Figs 2, 
Supplementary Fig. 15 and Supplementary Tables 22-24), they may 
have implications for understanding therapeutic responses to targeted 
agents, particularly those targeting tyrosine kinases or phosphatases. 

Finally, because the integrative subgroups occur at different 
frequencies in the overall population, focusing sequencing efforts 
on representative numbers from these groups will help to establish 
a comprehensive breast cancer somatic landscape at sequence-level 
resolution. For example, a significant number (~17%, n = 167 in the 
discovery cohort) of breast cancers are devoid of somatic CNAs, and 
are ripe for mutational profiling. Our work provides a definitive 
framework for understanding how gene copy number aberrations 
affect gene expression in breast cancer and reveals novel subgroups 
that should be the target of future investigation. 


METHODS SUMMARY 


All patient specimens were obtained with appropriate consent from the relevant 
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Figure 5 | The integrative subgroups have distinct clinical outcomes. 

a, Kaplan-Meier plot of disease-specific survival (truncated at 15 years) for the 
integrative subgroups in the discovery cohort. For each cluster, the number of 
samples at risk is indicated as well as the total number of deaths (in 
parentheses). b, 95% confidence intervals for the Cox proportional hazard 
ratios are illustrated for the discovery and validation cohort for selected values 
of key covariates, where each subgroup was compared against IntClust 3. 


10 


features it may be possible to derive more robust patient classifiers. We 
show here, for the first time, that subtype-specific trans-acting aberra- 
tions modulate concerted transcriptional changes, such as the TCR 
deletion-mediated adaptive immune response that characterizes the 
CNA-devoid subgroup and the chromosome 5 deletion-associated cell 
cycle program in the basal cancers. 

The integrated CNA-expression landscape highlights a limited 
number of genomic regions that probably contain driver genes, 
including ZNF703, which we recently described as a luminal B specific 
driver"’, as well as somatic deletion events affecting key subunits of the 


institutional review board. DNA and RNA were isolated from samples and 
hybridized to the Affymetrix SNP 6.0 and Illumina HT-12 v3 platforms for 
genomic and transcriptional profiling, respectively. A detailed description of 
the experimental assays and analytical methods used to analyse these data are 
available in the Supplementary Information. 
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Whole-genome analysis informs breast 
cancer response to aromatase inhibition 
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To correlate the variable clinical features of oestrogen-receptor-positive breast cancer with somatic alterations, we 
studied pretreatment tumour biopsies accrued from patients in two studies of neoadjuvant aromatase inhibitor therapy 
by massively parallel sequencing and analysis. Eighteen significantly mutated genes were identified, including five genes 
(RUNX1, CBFB, MYH9, MLL3 and SF3B1) previously linked to haematopoietic disorders. Mutant MAP3K1 was associated 
with luminal A status, low-grade histology and low proliferation rates, whereas mutant TP53 was associated with the 
opposite pattern. Moreover, mutant GATA3 correlated with suppression of proliferation upon aromatase inhibitor 
treatment. Pathway analysis demonstrated that mutations in MAP2K4, a MAP3K1 substrate, produced similar 
perturbations as MAP3K1 loss. Distinct phenotypes in oestrogen-receptor-positive breast cancer are associated with 
specific patterns of somatic mutations that map into cellular pathways linked to tumour biology, but most recurrent 
mutations are relatively infrequent. Prospective clinical trials based on these findings will require comprehensive 


genome sequencing. 


Oestrogen-receptor-positive breast cancer exhibits highly variable 
prognosis, histological growth patterns and treatment outcomes. 
Neoadjuvant aromatase inhibitor treatment trials provide an opportunity 
to document oestrogen-receptor-positive breast cancer phenotypes in a 
setting where sample acquisition is easy, prospective consent for geno- 
mic analysis can be obtained and responsiveness to oestrogen depriva- 
tion therapy is documented’. We therefore conducted massively parallel 
sequencing (MPS) on 77 samples accrued from two neoadjuvant 
aromatase inhibitor clinical trials*’. Forty-six cases underwent 
whole-genome sequencing (WGS) and 31 cases underwent exome 
sequencing, followed by extensive analysis for somatic alterations 
and their association with aromatase inhibitor response. Case selection 
for discovery was based on the levels of the tumour proliferation 
marker Ki67 in the surgical specimen, because high cellular prolifera- 
tion despite aromatase inhibitor treatment identifies poor prognosis 
tumours exhibiting oestrogen-independent growth* (Supplementary 
Fig. 1). Twenty-nine samples had Ki67 levels above 10% (‘aromatase- 
inhibitor-resistant tumours’, median Ki67 21%, range 10.3-80%) and 


48 were at or below 10% (‘aromatase-inhibitor-sensitive tumours’, 
median Ki67 1.2%, range 0-8%). Cases were also classified as luminal 
A or B by gene expression profiling. We subsequently examined inter- 
actions between Ki67 biomarker change, histological categories, 
intrinsic subtype and mutation status in selected recurrently mutated 
genes in 310 cases overall. Pathway analysis was applied to contrast 
the signalling perturbations in aromatase-inhibitor-sensitive versus 
aromatase-inhibitor-resistant tumours. 


Results 

The mutation landscape of luminal-type breast cancer 

Using paired-end MPS, 46 tumour and normal genomes were 
sequenced to at least 30-fold and 25-fold haploid coverage, respectively, 
with diploid coverage of at least 95% based on concordance with SNP 
array data (Supplementary Table 1). Candidate somatic events were 
identified using multiple algorithms”®, and were then verified by hybrid- 
ization capture-based validation that targeted all putative somatic single- 
nucleotide variants (SNVs) and small insertions/deletions (indels) that 
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overlap coding exons, splice sites and RNA genes (tier 1), high- 
confidence SNVs and indels in non-coding conserved or regulatory 
regions (tier 2), as well as non-repetitive regions of the human genome 
(tier 3). In addition, somatic structural variants and germline structural 
variants that potentially affect coding sequences (Supplementary 
Information) were assessed. Digital sequencing data from captured 
target DNAs from the 46 tumour and normal pairs (Supplementary 
Table 2 and Supplementary Information) confirmed 81,858 mutations 
(point mutations and indels) and 773 somatic structural variants. The 
average numbers of somatic mutations and structural variants were 
1,780 (range 44-11,619) and 16.8 (range 0-178) per case, respectively 
(Supplementary Table 3). Tier 1 point mutations and small indels 
predicted for all 46 cases also were validated using both 454 and 
Illumina sequencing (Supplementary Information). BRC25 was a clear 
outlier with only 44 validated tiers 1-3 mutations, all at low allele 
frequencies (ranging from 5% to 26.8%). This sample probably had 
low tumour content despite histopathology assessment, but the data 
are included to avoid bias. 

The overall mutation rate was 1.18 validated mutations per megabase 
(Mb) (tier 1: 1.05; tier 2: 1.14; tier 3: 1.20). The mutation rate for tier 1 
was higher than that observed for acute myeloid leukaemia (0.18- 
0.23)°’, but lower than that reported for hepatocellular carcinoma 
(1.85)%, malignant melanoma (6.65)° and lung cancers (3.05-8.93)'°"" 
(Supplementary Table 4). The background mutation rate (BMR) across 
the 21 aromatase-inhibitor-resistant tumours was 1.62 per Mb, nearly 
twice that of the 25 aromatase-inhibitor-sensitive tumours at 0.824 per 
Mb (P= 0.02, one-sided t-test). A trend for more somatic structural 
variations in the aromatase-inhibitor-resistant group was also observed, 
as the validated somatic structural variation frequency in the 21 
aromatase-inhibitor-resistant tumour genomes was 21.69 versus an 
average of 12.76 in 25 aromatase-inhibitor-sensitive tumours 
(P = 0.16, one-sided t-test) (Fig. 1). If ten TP53 mutated cases were 
excluded, the background mutation rate still tended to be higher in the 
aromatase-inhibitor-resistant group (P = 0.08). To demonstrate that a 
single-tumour core biopsy produced representative genomic data, 
whole-genome sequencing of two pre-treatment biopsies was con- 
ducted for 5 of the 46 cases. The frequency of mutations in the paired 


Aromatase-inhibitor-sensitive 


specimens showed high concordance in all cases (correlation co- 
efficiency ranged from 0.74 to 0.95) (Supplementary Fig. 2) and a 
somatic mutation was infrequently detected in only one of the two 
samples (4.65% overall). 


Significantly mutated genes in luminal breast cancer 
The discovery effort was extended by studying 31 additional cases by 
exome sequencing, producing an additional 1,371 tier 1 mutations. In 
total the 77 cases yielded 3,355 tier 1 somatic mutations, including 
3,208 point mutations, 1 dinucleotide mutation and 146 indels, 
ranging from 1 to 28 nucleotides. The point mutations included 733 
silent, 2,145 missense, 178 nonsense, 6 read-through, 69 splice-site 
mutations and 77 in RNA genes (Supplementary Table 5). Of 2,145 
missense mutations, 1,551 were predicted to be deleterious by SIFT” 
and/or PolyPhen**. The MuSiC package* was applied to determine 
the significance of the difference between observed versus expected 
mutation events in each gene, on the basis of the background 
mutation rate. This identified 18 significantly mutated genes with a 
convolution false discovery rate (FDR) <0.26 (Table 1 and Sup- 
plementary Table 6). The list contains genes previously identified as 
mutated in breast cancer (PIK3CA™, TP53'°, GATA3”, CDH1', 
RB1'°, MLL3"’", MAP3K1** and CDKN1B"”) as well as genes not previ- 
ously observed in clinical breast cancer samples, including TBX3, 
RUNX1, LDLRAP1, STNM2, MYH9, AGTR2, STMN2, SF3B1 and 
CBEB. 

Thirteen mutations (3 nonsense, 6 frame-shift indels, 2 in-frame 
deletions and 2 missense) were identified in MAP3KI1 (Table 1 and 
Fig. 2), a serine/threonine kinase that activates the ERK and JNK kinase 
pathways through phosphorylation of MAP2K1 and MAP2K4 (ref. 20). 
Of interest, a missense (S184L) and a splice-region mutation (e2+3 
probably affecting splicing) in MAP2K4 were observed in two tumours 
with no MAP3K1 mutation (Fig. 2). Single nonsynonymous mutations 
in MAP3K12, MAP3K4, MAP4K3, MAP4K4, MAPK15 and MAPK3 
were also detected (Supplementary Table 5). TBX3 harboured three 
small indels (one insertion and two deletions). TBX3 affects expansion 
of breast cancer stem-like cells through regulation of FGFR*’. Two 
truncating mutations in the tumour suppressor CDKNIB were 


Figure 1 | Genome-wide somatic mutations. Circos plots“ indicate validated 
somatic mutations comprising tier 1 point mutations and indels, genome-wide 
copy number alterations, and structural rearrangements in six representative 
genomes. Three on-treatment Ki67 less than or at 10% (top panel: BRC15, 
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BRC17 and BRC22) and three on-treatment Ki67 greater than 10% (bottom 
panel: BRC44, BRC47 and BRC50) cases are shown. Significantly mutated 
genes are highlighted in red. No purity-based copy number corrections were 
used for plotting copy number. 
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Table 1 | Significantly mutated genes identified in 46 whole genomes 
and 31 exomes sequenced in luminal breast cancer patients 


Gene Total MS NS Indel SS P value FDR 
MAP3K1 13 2 3 8 0 (0) ) 
PIK3CA 45 44 0 1 0O (0) ) 
TP53 1813 1 1 1 (0) ) 
GATA3 8 0 4 3 115x109 7.41 x10°'° 
CDH1 8 1 5 1 3007x1075 159x101! 
TBX3 3 0 0 3 0 258x10-% 0.01 

ATR 6 6 0 0 0 373x10° 0.014 
RUNX1 4 4 0 0 0 659x10~° 0.02 
ENSGO0000212670* 2 2 0 0 0 231x10% 0.066 
RB1 4 2 0 1 276x10% 0.07 
LDLRAP1 2 0 O 427x10% 0.092 
STMN2 2 0 1 0 415x1075 0.092 
MYH9 4 2 0 896x10% 0.178 
MLL3 5 3 0 104x104 0.19 
CDKN1B 2 0 1 0 139x104 0.240 
AGTR2 2 2 0 0 0. 1,71 x10~4 0.256 
SF3B1 3 3 0 0 0 179x104 0.256 
CBFB 2 0 O 1.70x10~% 0.256 


*ENSGO0000212670 is not in RefSeq release 50. 
MS, Missense; NS, nonsense; SS, splice site. 


identified’’. Four missense RUNX1 mutations were observed, with 
three in the RUNT domain clustered within the 8 amino acid putative 
ATP-binding site (R166Q, G168E and R169K). RUNX1 is a transcrip- 
tion factor affected by mutation and translocation in the M2 subtype of 
acute myeloid leukaemia” and is implicated in tethering the oestrogen 
receptor to promoters independently of oestrogen response elements”. 
Two mutations (N104S and N140*) were also identified in CBFB, the 
binding partner of RUNX1. Additional mutations included 3 missense 
(2 K700E and 1 K666Q), in SF3B1, a splicing factor implicated in 
myelodysplasia™* and chronic lymphocytic leukaemia”. One missense 
mutation, one nonsense mutation and two indels were found in the 
MYH gene, involved in hereditary macrothrombocytopenia” as well 
as being observed in an ALK translocation in anaplastic large cell 
lymphoma”. 

We also identified three significantly mutated genes (LDLRAPI, 
AGTR2 and STMN2) not previously implicated in cancer. A missense 
and a nonsense mutation were observed in LDLRAP1, a gene asso- 
ciated with familial hypercholesterolaemia”’. AGTR2, angiotensin II 
receptor type 2, harboured two missense mutations (V184I and 
R251H). Angiotensin signalling and oestrogen receptor intersect in 
models of tissue fibrosis”. STMN2, a gene activated by JNK family 
kinases*’*’ and therefore regulated by MAP3K1 and MAP2K4, 
harboured one frameshift deletion and one missense mutation. 
Three deletions and one point mutation (Supplementary Fig. 3) were 
identified in a large, infrequently spliced non-coding (Inc) RNA gene, 
MALATI (metastasis associated lung adenocarcinoma transcript 1), 
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that regulates alternative splicing by modulating the phosphorylation 
of SR splicing factor**. Translocations and point mutations of MALAT1 
have been reported in sarcoma®™ and colorectal cancer cell lines**. Five 
additional MALAT1 mutations were found in the recurrent screening 
set (Supplementary Table 5d). The locations of these mutations 
clustered in a region of species homology (F1 and 2 domains) that 
could mediate interactions with SRSF1 (ref. 32, Supplementary Fig. 4). 
Non-coding mutation clusters were found in ATR, GPR126 and NRG3 
(Supplementary Information and Supplementary Table 7). 


Correlating mutations with clinical data 

To study clinical correlations, mutation recurrence screening was 
conducted on an additional 240 cases (Supplementary Table 8 and 
Supplementary Fig. 1). By combining WGS, exome and recurrence 
screening data, we determined the mutation frequency in PIK3CA to 
be 41.3% (131 of 317 tumours) (Supplementary Table 5a—d and 
Supplementary Fig. 3). TP53 was mutated in 51 of 317 tumours 
(16.1%) (Supplementary Table 5a-d and Supplementary Fig. 3). 
Additionally, 52 nonsynonymous MAP3K1 mutations in 39 tumours 
and 10 mutations in its substrate MAP2K4 were observed, represent- 
ing a combined case frequency of 15.5% (Supplementary Table 5a—d 
and Fig. 3). Of note, 52 of the 62 non-silent mutations in MAP3K1 and 
MAP2K4 were scattered indels or other protein-truncating events 
strongly suggesting functional inactivation. In addition, 13 tumours 
harboured two non-silent MAP3K1 mutations, indicative of bi-allelic 
loss and reinforcing the conclusion that this gene is a tumour sup- 
pressor. Twenty nine tumours harboured a total of 30 mutations in 
GATA3, consisting of 25 truncation events, one in-frame insertion, 
and 4 missense mutations including 3 recurrent mutations at M294K 
(Supplementary Table 5a-d and Supplementary Fig. 3). BRC8 
harboured a chromosome 10 deletion that includes GATA3. CDH1 
mutation data were available for 169 samples and, as expected, its 
mutation status was strongly associated with lobular breast cancer’? 
(Table 2a). We applied a permutation-based approach in MuSiC* to 
ascertain relationships between mutated genes. Negative correlations 
were found between mutations in gene pairs such as GATA3 and 
PIK3CA (P = 0.0026), CDH1 and GATA3 (P = 0.015), and CDH1 
and TP53 (P = 0.022). MAP3K1 and MAP2K4 mutations were mutu- 
ally exclusive, albeit without reaching statistical significance (P = 0.3). 
In contrast, a positive correlation between MAP3K1/MAP2K4 and 
PIK3CA mutations was highly significant (P = 0.0002) (Supplemen- 
tary Table 9). 

Two independent mutation data sets, designated ‘Set 1’ (discovery 
cohort) and ‘Set 2’ (validation cohort), from these clinical trial samples 
were analysed separately and then in combination, with a false discovery 
rate (FDR)-corrected P value to gauge the overall strength and 
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Figure 2 | MAP3K1 and MAP2K4 mutations observed in 317 samples. 
Somatic status of all mutations was obtained by Sanger sequencing of PCR 
products or Illumina sequencing of targeted capture products. The locations of 
conserved protein domains are highlighted. Each nonsynonymous 


substitution, splice site mutation or indel is designated with a circle at the 
representative protein position with colour to indicate translation effects of the 
mutation. Asterisk, nonsense mutations that cause truncation of the open 
reading frame. 
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Figure 3 | Structural variants in significantly mutated or frequently deleted 
genes. One MAP3K1 deletion in BRC49 and one MAP2K4 deletion in BRC47, 
and one ELP3-NRGI fusion in BRC49 identified using Illumina paired-end 


consistency of genotype-phenotype relationships (Table 2a, b and 
Supplementary Fig. 1). TP53 mutations in both data sets correlated 
with significantly higher Ki67 levels, both at baseline (P = 0.0003) and 
at surgery (P= 0.001). Furthermore, TP53 mutations were signifi- 
cantly enriched in luminal B tumours (P = 0.04) and in higher histo- 
logical grade tumours (P = 0.02). In contrast, MAP3K1 mutations 
were more frequent in luminal A tumours (P = 0.02), in grade 1 
tumours (P= 0.005) and in tumours with lower Ki67 at baseline 
(P = 0.001) with consistent findings across both data sets. GATA3 
mutation did not influence baseline Ki67 levels but was enriched in 
samples exhibiting greater percentage Ki67 decline (P = 0.01). This 
finding requires further verification because it was significant in Set 
1 (uncorrected P value 0.003) but was a marginal finding in Set 2 
(P = 0.08). However, it suggests GATA3 mutation may be a positive 
predictive marker for aromatase inhibitor response. 


Structural variation and DNA repair mechanisms 
Analysis of copy number alterations (CNAs) revealed arm-level gains 
for 1q, 5p, 8q, 16p, 17q, 20p and 20q and arm-level losses for 1p, 8p, 
16q, and 17p in the 46 WGS tumour genomes (Supplementary Fig. 5). 
A total of 773 structural variants (579 deletions, 189 translocations 
and 5 inversions) identified by WGS were validated as somatic in 46 
breast cancer genomes by capture validation. No recurrent transloca- 
tions were detected but six in-frame fusion genes were validated by 
reverse transcription followed by PCR (Supplementary Information 
and Supplementary Tables 10-13). Seven tumours had multiple com- 
plex translocations with breakpoints suggestive of a catastrophic 
mitotic event (‘chromothripsis’; Supplementary Table 11). Analysis 
of the structural variant genomic breakpoints shows the spectra of 
putative chromothripsis-related events are the same as seen for other 
somatic events, with the majority of structural variants arising from 
non-homologous end-joining. We classified somatic (mitotic) and 
germline (meiotic) structural variants into four groups: variable 
number tandem repeat (VNTR), non-allelic homologous recombina- 
tion (NAHR), microhomology-mediated end joining (MMEJ), and 
non-homologous end joining (NHEJ), according to criteria described 
in Supplementary Information. The fraction of each classification is 
shown for germline and somatic (mitotic) events (Supplementary 
Table 14). There were significantly more somatic NHEJ events in 
tumour genomes than the other three types (P< 2.2 X 10 '°), 


Pathways relevant to aromatase inhibitor response 


Pathscan* analysis (Supplementary Table 15 and Supplementary 
Information) indicated that somatic mutations detected in the 77 
discovery cases affect a number of pathways, including caspase 
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cascade/apoptosis, ErbB signalling, Akt/PI3K/mTOR signalling, 
TP53/RB signalling and MAPK/JNK pathways (Fig. 4a). To discern 
the pathways relevant to aromatase inhibitor sensitivity, we con- 
ducted separate pathway analyses for aromatase-inhibitor-sensitive 
versus aromatase-inhibitor-resistant tumours. Whereas the majority 
of top altered pathways (FDR=0.15) in each group are shared, 
several pathways were enriched in the aromatase-inhibitor-resistant 
group, including the TP53 signalling pathway, DNA replication, and 
mismatch repair. Specifically, 38% of the aromatase-inhibitor- 
resistant group (11 of 29 tumours) have mutations in the TP53 
pathway with three having double or triple hits involving TP53, 
ATR, APAF1 or THBS1. In contrast, only 16.6% (8 of 48 tumours) 
of the Ki67 low group had mutations in the TP53 signalling pathway, 
each with only a single hit in genes TP53, ATR, CCNE2 or IGFI. 
(Supplementary Table 16). 

GeneGo pathway analysis of MetaCore interacting network objects 
was used to identify genes in the 77 luminal breast cancers with low- 
frequency mutations that cluster into pathway maps. Eight networks 
assembled from significant maps encompassed mutations from 71 
(92%) of the tumours (Fig. 4b). Many of the network objects shared 
pathways with significantly mutated genes such as TP53, MAP3K1, 
PIK3CA and CDH1. GeneGo analysis also revealed that several genes 
with low-frequency mutations were actually subunits of complexes, 
resulting in higher mutation rates for that object, for example, the 
condensin complex (4 mutations in 4 genes) and the MRN complex (4 
mutations in 3 genes). Several pathways without multiple significantly 
mutated genes, such as the apoptotic cascade, calcium/phospholipase 
signalling and G-protein-coupled receptors, were significantly affected 
by low-frequency mutations. Grouping tumours by significantly 
mutated genes and pathway mutation status showed that whereas 55 
(71%) of the tumours contained significantly mutated genes in signifi- 
cant pathways, an additional 16 (21%) contained only non-significantly 
mutated genes in these pathways. Thus, tumours without a given sig- 
nificantly mutated gene often had other mutations in the same relevant 
pathway (Fig. 4b, Supplementary Fig. 6, Supplementary Table 17 and 
Supplementary Information). 

We also applied PARADIGM” to infer pathway-informed gene 
activities using gene expression and copy-number data to identify 
several ‘hubs’ of activity (Supplementary Fig. 7, Supplementary Fig. 8 
and Supplementary Information). As expected, ESRI and FOXA1 were 
among the hubs activated cohort-wide while other hubs exhibited high 
but differential changes in aromatase-inhibitor-resistant tumours 
including MYC, FOXMI1 and MYB (Supplementary Fig. 8). The con- 
cordance among the 104 MetaCore maps from GeneGo analysis 
described above is significant, with 75 (72%) matching one of the 
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Figure 4 | Key cancer pathway components altered in luminal breast 
tumours. a, Only genetic alterations identified in 46 WGS cases are shown. 
Alterations were discovered in key genes in the TP53/RB, MAPK, PI3K/AKT/ 
mTOR pathways. Genes coloured blue and red are predicted to be functionally 
inactivated and activated, respectively, through focused mutations including 
point mutations and small indels (M), copy number deletions (C), or other 
structural changes (S) that affect the gene. The inter-connectedness of this 
network (several pathways) shows that there are many different ways to perturb 
a pathway. b, Eight interaction networks from canonical maps are significantly 


PARADIGM subnetworks at the 0.05 significance level after multiple 
test correction (P< 4.4 X 10°; Bonferroni-adjusted hypergeometric 
test) (Supplementary Fig. 9). We identified significant subnetworks 
associated with Ki67 biomarker status (Supplementary Fig. 10 and Sup- 
plementary Information) involving transcription factors controlling 
large regulons. 

The PARADIGM- inferred pathway signatures were further used to 
derive a map of the genetic mechanisms that may underlie treatment 
response. A subnetwork was constructed in which interactions were 
retained only if they connected two features with higher than average 
absolute association with Ki67 biomarker status (Supplementary Figs 
10 and 11 and Supplementary Information). Consistent with the 
PathScan results, among the largest of the hubs in the identified 
network were a central DNA damage hub with the second highest 
connectivity (55 regulatory interactions; 1% of the network) and TP53 
with the 14th highest connectivity (26 connections; 0.5% of the network). 
Additional highly connected hubs identified in order of connectivity 
were MYC with 79 connections (1.4%), FYN with 45 (0.8%), MAPK3 
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over-represented by mutations in 77 luminal breast tumours (46 WGS and 31 
exome cases). In the concentric circle diagram, tumours are arranged as radial 
spokes and categorized by their mutation status in each network (concentric 
ring colour) and significantly mutated gene mutation status (black dots). 
Tumour classification by pathway analysis shows many tumours unaffected by 
a given significantly mutated gene often harbour other mutations in the same 
network. For full annotation, see Supplementary Information and 
Supplementary Fig. 6. PLC, phospholipase C; SMG, significantly mutated gene. 


with 43, JUN with 40, HDACI with 40, SHC1 with 39, and HIF1A/ 
ARNT complex with 39 (Supplementary Fig. 11). 

To identify higher-level connections between mutations and 
clinical features, we compared the samples on the basis of pathway- 
derived signatures. For each clinical attribute and each significantly 
mutated gene, we dichotomized the discovery samples into a positive 
and a negative group to derive pathway signatures that discriminated 
between the groups (see details in Supplementary Information). We 
then computed all pair-wise Pearson correlations between pathway 
signatures and clustered the resulting correlations (Fig. 5). The entire 
process was repeated using validated mutations and signatures 
derived from the validation set (Supplementary Fig. 12). In line with 
expectation, PIK3CA, MAP3K1, MAP2Kz4, and low risk preoperative 
endocrine prognostic index (PEPI) scores (PEPI is an index of 
recurrence risk post neoadjuvant aromatase inhibitor therapy*) 
cluster with the luminal A subtypes and with each other, and are 
supported by the validation set analysis. The luminal B-like signatures 
included TP53, RB1, RUNX1 and MALAT1, which also associated 
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Figure 5 | Pathway signatures reveal connections between mutations and 
clinical outcomes. PARADIGM-based pathway signatures were derived for 
tumour feature dichotomies including mutation driven gene signatures 
(mutant versus non-mutant), histopathology type (lobular versus ductal), 
preoperative endocrine prognostic index (PEPI) score (PEPI = 0 favourable 
versus PEPI >0 unfavourable), PAM50 (50-gene intrinsic breast cancer 
subtype classifier) luminal A subtype (luminal A versus luminal B) and the 
reverse (luminal B versus luminal A), histopathology grade (grades II and III 
versus I), baseline Ki67 levels (= 14% versus < 14%), and end-of-treatment 
Ki67 levels (= 10% versus < 10%) and overall PEPI score (higher than mean 
unfavourable versus lower than mean favourable). Pearson correlations were 
computed between all pair-wise signatures; positive correlations, red; negative 
correlations, blue; column features ordered identically as rows. Correlation 
analysis on the 77 samples in the discovery set is shown. Asterisk: Ki67 < 2.7%, 
oestrogen-receptor-positive, node negative and tumour size = 5 cm. 


Table 2 | Correlations between mutations and clinical features 


with other poor outcome features such as high baseline and surgical 
Ki67 levels, high grade histology and high PEPI scores. The TP53 and 
MALAT1 associations in the discovery set also were supported by the 
validation set analysis. 


Druggable gene analysis 

We defined mutations in druggable tyrosine kinase domains includ- 
ing in ERBB2 (a V777L and a 755-759'"'NT in-frame deletion 
homologous to gefitinib-sensitizing EGFR mutations in lung cancer”’), 
as well as in DDR1 (A829V, R611C), DDR2 (E583D), CSF1R (D735H, 
M875L), and PDGFRA (E924K). In addition, pleckstrin homology 
domain mutations were observed in AKTI (C77F) and AKT2 (S11F) 
and a kinase domain mutation was identified in RPS6KB1 (S375F) 
(Supplementary Table 18). 


Discussion 


The low frequency of many significantly mutated genes presents an 
enormous challenge for correlative analysis, but several statistically 
significant patterns were identified, including the relationship between 
MAP3K1 mutation, luminal A subtype, low tumour grade and low 
Ki67 proliferation index. On this basis, for patients with MAP3K1 
mutant luminal tumours, neoadjuvant aromatase inhibitor could pro- 
vide a favourable option. In contrast, tumours with TP53 mutations, 
which are mostly aromatase inhibitor resistant, would be more appro- 
priately treated with other modalities. MAP3K1 activates the ERK 
family, thus, loss of ERK signalling could explain the indolent nature 
of MAP3K1-deficient tumours’. However, MAP3K1 also activates 
JNK through MAP2K4, which also can be mutated**. Loss of JNK 
signalling produces a defect in apoptosis in response to stress, which 
would hypothetically explain why these mutations accumulate’””°. 
PIK3CA harboured the most mutations (41.3%) but was neither asso- 
ciated with clinical nor Ki67 response, confirming our earlier report*’. 
However, the positive association between MAP3K1/MAP2K4 muta- 
tions and PIK3CA mutation at both the mutation and pathway levels 
suggests cooperativity (Fig. 4a). 

The finding of multiple significantly mutated genes linked previ- 
ously to benign and malignant haematopoietic disorders suggests that 
breast cancer, like leukaemia, can be viewed as a stem-cell disorder 


a Luminal subtype and histology grade 
Gene Expression/histo-pathology variable Mutation frequency* Set1l Py Set2 P+ Whole set FDR Pt 
TP53 Luminal subtype A 9.3% (13/140) 0.001 0.46 0.041 
Luminal subtype B 21.5% (38/177) 
TP53 Histological grade | 4.5% (3/66) 0.05 0.067 0.02 
Histological grade II/III 19.2% (48/250) 
MAP3K1 Luminal subtype A 20.0% (28/140) 0.018 0.028 0.005 
Luminal subtype B 6.2% (11/177) 
MAP3K1 Histological grade | 25.8% (17/66) 0.061 0.011 0.005 
Histological grade II/III 8.8% (22/250) 
CDH1 Histological type ductal 5.9% (10/169) 0.418 2.8 x 1071! 3.9 x10°!° 
Histological type lobular 50.0% (20/40) 
b Mutation and Ki67 index 
Gene Ki67 variable Wild type mean|| Mutant mean || Set1 PY Set2 Ps Whole set FDR Pt 
TP53 Baseline 13.1 25.1 3:7-<10F° 0.012 0.0003 
Surgery 14 4 0.0002 0.014 0.001 
% change —89.2 —84.3 0.09 0.28 0.24 
MAP3K1 Baseline 15.8 8.1 0.049 0.001 0.002 
Surgery 1.86 0.75 0.11 0.1 0.05 
% change —88.3 —90.5 0.49 0.65 0.55 
GATA3 Baseline 14.8 11.5 0.13 0.95 0.56 
Surgery 1.95 0.38 0.001 0.23 0.012 
% change —86.8 —96.9 0.003 0.08 0.012 
* Mutation percentage (mutant cases/total cases in a category), counts are based on all cases (Set 1 and Set 2 combined). 
+ Unadjusted P value from Fisher’s exact test or Chi-square test as appropriate. 
¢ Benjamini-Hochberg false discovery rate (FDR)-adjusted P value using all cases (Setl and Set2 combined). 
§ Only 77 cases in Setl had CDH1 sequencing results. 


\|Geometric means are based on all cases (Setl and Set2 combined). 


Unadjusted P value from Wilcoxon rank sum test. 
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that produces indolent or aggressive tumours that display varying 
phenotypes depending on differentiation blocks generated by differ- 
ent mutation repertoires”. Whereas only MLL3 showed statistical 
significance in the analysis of 46 WGS cases, multiple mutations in 
genes related to histone modification and chromatin remodelling are 
worth noting (Supplementary Table 19). An array of coding muta- 
tions and structural variations was discovered in methyltransferases 
(MLL2, MLL3, MLL4 and MLLS5), demethyltransferases (KDM6A, 
KDM4A, KDM5B and KDM5C), and acetyltransferases (MYST1, 
MYST3 and MYST4). Furthermore, our analysis identified several 
adenine-thymine (AT)-rich interactive domain-containing protein 
genes (ARID1A, ARID2, ARID3B and ARID4B) that harboured muta- 
tions and large deletions, reinforcing the role of members from the 
SNE/SWI family in breast cancer. 

Pathway analysis enables the evaluation of mutations with low 
recurrence frequency where statistical comparisons are conventionally 
underpowered. For example, the eight samples with MAP2K4 muta- 
tions were sufficient to derive a reliable pathway-based gene signature 
in PARADIGM that aligns with MAP3K1. This approach also pointed 
to a putative connection between MALATI and the TP53 pathway. 
Finally, we provide evidence that transcriptional associations to Ki67 
response reside in a connected network under the control of several key 
‘hub’ genes including MYC, FYN and MAP kinases, among others. 
Targeting these hubs in resistant tumours could produce therapeutic 
advances. In conclusion, the genomic information derived from 
unbiased sequencing is a logical new starting point for clinical invest- 
igation, where the mutation status of an individual patient is deter- 
mined in advance and treatment decisions are driven by therapeutic 
hypotheses that stem from knowledge of the genomic sequence and its 
possible consequences. However, the accrual of large numbers of 
patients and the use of comprehensive sequencing and gene expression 
approaches will be required because of the extreme genomic hetero- 
geneity documented by this investigation. 


METHODS SUMMARY 


Clinical trial samples were accessed from the preoperative letrozole phase 2 study 
(NCT00084396)’ that investigated the effect of letrozole for 16 to 24 weeks on 
surgical outcomes and from the American College of Surgeons Oncology Group 
(ACOSOG) Z1031 study (NCT00265759)* that compared anastrozole with 
exemestane or letrozole for 16 to 18 weeks before surgery (REMARK flow charts, 
Supplementary Fig. 1). Baseline snap-frozen biopsy samples with greater than 
70% tumour content (by nuclei) underwent DNA extraction and were paired with 
a peripheral blood DNA sample. Two formalin-fixed biopsies were obtained at 
baseline and at surgery, and were used to conduct oestrogen receptor and Ki67 
immunohistochemistry as previously published’. Paired end Illumina reads from 
tumours and normal samples were aligned to NCBI build36 using BWA. Somatic 
point mutations were identified using SomaticSniper’’, and indels were identified 
by combining results from a modified version of the Samtools indel caller (http:// 
samtools.sourceforge.net/), GATK and Pindel. Structural variations were 
identified using BreakDancer? and SquareDancer (unpublished). All putative 
somatic events found in 46 cases were validated by targeted custom capture arrays 
(Nimblegen)/Illumina sequencing and all tier 1 mutations for 46 WGS cases also 
were validated using PCR/454 sequencing. All statistical analyses, including 
significantly mutated gene, mutation relation and clinical correlation were done 
using the MuSiC package** and/or by standard statistical tests (Supplementary 
Information). Pathway analysis was performed with PathScan, GeneGo Metacore 
(http://www.genego.com/metacore.php) and PARADIGM. A complete descrip- 
tion of the materials and methods used to generate this data set and results is 
provided in the Supplementary Methods section. 
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Large-scale prediction and testing of 
drug activity on side-effect targets 


Eugen Lounkine!*, Michael J. Keiser?*, Steven Whitebread!, Dmitri Mikhailov’, Jacques Hamon’, Jeremy L. Jenkins!, 
Paul Lavan*, Eckhard Weber’, Allison K. Doak’, Serge Coté’, Brian K. Shoichet? & Laszlo Urban! 


Discovering the unintended ‘off-targets’ that predict adverse drug reactions is daunting by empirical methods alone. 
Drugs can act on several protein targets, some of which can be unrelated by conventional molecular metrics, and 
hundreds of proteins have been implicated in side effects. Here we use a computational strategy to predict the 
activity of 656 marketed drugs on 73 unintended ‘side-effect’ targets. Approximately half of the predictions were 
confirmed, either from proprietary databases unknown to the method or by new experimental assays. Affinities for 
these new off-targets ranged from 1 nM to 30 pM. To explore relevance, we developed an association metric to prioritize 
those new off-targets that explained side effects better than any known target of a given drug, creating a drug-target- 
adverse drug reaction network. Among these new associations was the prediction that the abdominal pain side effect of 
the synthetic oestrogen chlorotrianisene was mediated through its newly discovered inhibition of the enzyme 
cyclooxygenase-1. The clinical relevance of this inhibition was borne out in whole human blood platelet aggregation 
assays. This approach may have wide application to de-risking toxicological liabilities in drug discovery. 


Adverse drug reactions (ADRs) can limit the use of otherwise effective 
drugs. Next to lack of efficacy, they are the leading cause for attrition 
in clinical trials of new drugs’ and are more prominent still in the 
failure of molecules to advance from pre-clinical research into human 
trials*, Some ADRs are caused by modulation of the primary target 
of a drug’, others result from non-specific interactions of reactive 
metabolites®. In many cases, however, ADRs are caused by unintended 
activity at off-targets. Notorious examples of off-target toxicity include 
that of the appetite suppressant fenfluramine-phentermine (fen- 
phen), which was withdrawn from the market after numerous patient 
deaths. These owed to the activation of the 5-hydroxytryptamine-2B 
(5-HT2,) receptor by one of its metabolites, norfenfluramine, leading 
to proliferative valvular heart disease’. Similarly, well-known drugs, 
such as the antihistamine terfenadine, have been withdrawn because 
they caused arrhythmias and death, which have been attributed to 
their off-target inhibition of the human ether-a-go-go-related gene 
potassium channel (hERG, also known as KCNH2)*”. Prediction of 
unknown off-target drug interactions might prevent such disastrous 
drug toxicities, which are often detected only after fatalities in the 
clinic, and might allow safer molecules to be prioritized for pre-clinical 
development. Methods to systematically predict off-targets, and asso- 
ciate these with side effects, have thus attracted intense interest!*"!°, 
frequently in the form of either chemical genomics'”"* or informatics 
approaches. 

Whereas the informatics methods have never been tested system- 
atically on a large scale, in principle they can be deployed against 
thousands of targets. Here we present a large-scale, prospective evalu- 
ation of safety target prediction using one such method, the similarity 
ensemble approach (SEA)**’. SEA calculates whether a molecule will 
bind to a target based on the chemical features it shares with those of 
known ligands, using a statistical model to control for random 
similarity. Because SEA relies only on chemical similarity, it can be 
applied systematically and, for those targets that have known ligands, 
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comprehensively. For 656 drugs approved for human use (Sup- 
plementary Table 1), targets were predicted from among 73 proteins 
(Supplementary Table 2 and Methods) with established association of 
ADRs”*”S, for which assays were available at Novartis. Encouragingly, 
many of the predictions were confirmed, often at pharmacologically 
relevant concentrations. This motivated us to develop a guilt-by- 
association metric that linked the new targets to the ADRs of 
those drugs for which they are the primary or well-known off-targets, 
creating a drug-target-ADR network. The applicability and the 
limitations of this approach will be considered. 


Testing the predictions 


The 656 drugs were computationally screened for their likelihood to 
bind to 73 targets (Supplementary Table 2) using SEA*”’’. The targets 
belong to the Novartis in vitro safety panels based on their association 
with ADRs*”*. Here we insisted that they also be described in the 
ChEMBL database”’, enabling correspondence with SEA predictions 
(Supplementary Table 2). ChEMBL annotates more than 285,000 
ligands modulating more than 1,500 different human targets with 
affinities better than 30 uM. SEA calculated the similarity of each drug 
versus each set of ligands for the 73 targets, comparing the overall set 
similarity to a model of such expected at random. For instance, the 
sodium channel blocker aprindine loosely resembled the set of 
histamine H, ligands; although no single H, ligand was strongly 
similar to the drug (Table 1), the overall similarity of the set was much 
greater than expected at random, leading to a highly significant SEA 
expectation value (E value) of 5 X 10~*° between aprinidine and H, 
receptor ligands. Only 1,644 of the more than 47,000 possible drug- 
target pairs had significant E values. Of these, 403 were already known 
in ChEMBL and so were trivially confirmed; we do not consider these 
further. Of the remaining 1,241 predictions, 348 (28%) were unknown 
to ChEMBL, but could be found in proprietary ligand-target 
databases that were unavailable to SEA (see Methods). The remaining 
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Table 1 | New drug-off-target predictions confirmed by in vitro experiment 


Drug Closest chEMBL molecule Tc value Target SEA E value ICs (uM) Closest known target BLAST E value 
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Representative, confirmed predictions are shown. Tc values determined using ECFP_4-based molecular similarity to the closest ChEMBL reference molecule in the target set. The closest known target is a known 
target of the drug that has highest sequence similarity to the predicted target. The BLAST E value is based on the sequence identity of the predicted target to the closest known target; values greater than 10° 


represent unrelated proteins. 


893 predictions represented previously unexplored drug-target asso- 
ciations. 

Of these predictions, 694 were tested at Novartis. For 478, activity was 
less than 25% at 30 [1M; these were considered disproved. For another 65 
predictions, activity was between 25 and 50% at 30 11M; these were con- 
sidered ambiguous. Finally, for 151 of the new drug-target predictions, 
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half-maximum inhibitory concentration (IC;9) values of less (better) 
than 30 UM were measured in concentration—response curves (Fig. la 
and Supplementary Fig. 1). In 125 cases, the drugs had an ICs9 value 
better than 10 uM, and in 48 cases activities were sub-micromolar 
(Table 1, Supplementary Table 3 and Supplementary Fig. 1). In 
summary, of the 1,042 predictions that were tested (694 by assay, 
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Figure 1 | Predicting off-targets, and their novelty. a, Success of SEA 
predictions. Known: predicted off-targets confirmed by proprietary databases. 
Confirmed: predictions tested in vitro achieving ICs9 < 30 1M. Ambiguous: 
predictions with 25-50% activity at 30 uM. Inactive: <25% activity. b, SEA 
enriched for non-trivial similarity. Drugs were binned (grey) by lowest Tc values 
yielding valid SEA predictions. Hit rates of SEA (red) and 1NN (blue) are shown. 
Error bars denote s.d. c, Non-intuitive (chlorotrianisene) and straightforward 
(medrysone) SEA predictions, with Tc values to closest references shown. 
Chlorotrianisene is only marginally similar to indomethacin, but is (correctly) 
predicted for COX-1. AR, androgen receptor. d, Sequence similarities of each 
confirmed drug off-target to the closest known target of the drug. 


348 by databases), 48% were confirmed either in proprietary databases, 
unknown to the method and to those undertaking the SEA calculation, 
or in Novartis assays in full concentration responses, and just under 
46% were disproved (Fig. 1a). 

In assessing these results, one would like to compare the true pre- 
dictions with the false-positive and the false-negative predictions. 
Whereas this work offers guidance on false-positive predictions, we 
can only address false negatives for a few compounds (Supplementary 
Results). Among these was astemizole, which had affinities ranging 
from 0.1 to 94M on the 5-HT2,4, 5-HT2, and 5-HT2¢ serotonin 
receptors and the histamine H, and dopamine D, receptors, as 
measured in other projects at Novartis. These targets were missed 
owing to a charge post-filter, separate from SEA itself, which excluded 
compounds with net charge dissimilar from the reference ligands”. 
Astemizole was improperly assigned*' a charge of +2, wrongly dif- 
ferentiating it from the known ligands; the SEA E values linking 
astemizole to these targets were themselves between 10 ~° to 10”. 
Other failures could be attributed to SEA itself. For instance, 
promazine bound to the histamine H, and H; receptors with low to 
mid-nanomolar affinities, but the SEA E values at 10 *to 10 ° were 
below our significance cut-off. This work was undertaken with 
ChEMBL_2 as a source of ligand-target association; had we used 
the more recent ChEMBL_10, the H, receptor would have been pre- 
dicted with an E value of 10°” (see http://sea.bkslab.org), and had we 
used ChEMBL_12 and a newer version of SEA, both targets would 
have been predicted. Clearly, with its reliance on topology and on 
inference from known ligand-target associations, SEA will have false 
negatives. 

A key question is whether the new predictions were in any way 
surprising. One way to evaluate this is to compare the similarity of 
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drugs predicted for new targets with the closest previously known 
ligand for that target. We used Tanimoto coefficients (Tc), which 
compare the groups in common between two molecules, here repre- 
sented by ECFP_4 fingerprints. Tc values between nearest molecules 
were small, often less than 0.4 (ref. 32); visual inspection of these pairs 
confirms the dissimilarity suggested by the low Tc values (Table 1). 
More systematically, SEA may be compared with a method that pre- 
dicts targets based only on the one-nearest neighbour (INN) model 
(Fig. 1b). For close analogues (Tc values > 0.7; Fig. 1c), the fraction of 
true positives was comparable between INN and SEA approaches 
(Fig. 1b). But across most similarity thresholds, SEA substantially 
outperformed 1NN, and by nearly twofold in the low similarity range. 
Thus, for the Rho kinase inhibitor fasudil, SEA predicted only the 
adrenergic o%24 receptor, with an E value of 1.1 x 10 7, which was 
experimentally confirmed (ICs 9 = 41M). This occurred despite the 
low similarity of the closest known «, ligand, which had a Tc value of 
0.37 to fasudil. Conversely, at this similarity threshold the INN model 
predicted nine targets, only three of which were confirmed 
(Supplementary Table 4). For chlorotrianisene, two of the three targets 
predicted by SEA were confirmed; conversely, at its 0.31 Tc for 
cyclooxygenase-1 (COX-1, also known as PTGS1) the INN model 
predicted ten targets, only two of which were confirmed. 

We also investigated how often the new off-target would have been 
obvious based on sequence similarity of the targets**7°**. We calcu- 
lated the BLAST sequence similarity of predicted targets to any known 
target of a drug (Table 1 and Supplementary Table 3). Of the 151 new 
off-target predictions, 39 (26%) had BLAST E values greater (worse) 
than 10°, suggesting the previously known targets shared no 
sequence similarity with the new off-targets (Table 1, Supplemen- 
tary Table 3 and Fig. 1d). For example, the anaesthetic dyclonine 
was shown to bind the histamine H, receptor (HRH2), whereas the 
closest known target was the Na,1.8 channel (SCN10A), which has no 
significant sequence similarity (BLAST E value > 1) and is functionally 
unrelated to the H, receptor. Similarly, the anti-nausea drug alosetron 
antagonized the 5-HT2, receptor with an ICs 9 of 18 nM, although 
5-HT, has no sequence similarity to the ion channel targets of this 
drug (Table 1). Chlorotrianisene potently inhibits the enzyme COX-1, 
which is unrelated by sequence to the primary nuclear hormone receptor 
of this drug, the oestrogen receptor (Table 1). 


Associating in vitro targets with ADRs 


To assess the potential clinical relevance of the discovered targets 
systematically, we developed a quantitative score that associated in 
vitro activity with patient ADRs. We enumerated all possible target- 
ADR pairs for 2,760 drugs with available adverse event annotations, 
expressing as an enrichment score the co-occurrence of pairs that 
were more common than expected by chance (Supplementary 
Table 5). For example, ‘abdominal pain upper’ has been reported 
for 45 drugs that interact with COX-1. The ADR abdominal pain 
upper was linked with 6,046 drug-target pairs, whereas COX-1 was 
linked with 2,188 drug-ADR pairs; there were a total of 681,797 
target-ADR pairs overall. Thus the pair abdominal pain upper- 
COX-1 was enriched 2.3-fold above random (Methods), with a 7 
Pvalue of 9.9 X 10”. A total of 3,257 significant target- ADR associa- 
tions were identified (Supplementary Table 5). 

Having identified new off-targets for the drugs, and linked these 
with observed ADRs, we sought drug-target-ADR connections that 
illuminate the clinical relevance of the predictions. Of the 151 con- 
firmed new drug-target associations tested at Novartis, 82 were sig- 
nificantly associated with one or more ADR, resulting in a total of 247 
drug-target-ADR links. In 116 cases, the enrichment factor (EF) of 
the new drug-target-ADR link was stronger than that for any previ- 
ously known target (Table 2 and Supplementary Table 6). For 
example, prenylamine was found to bind the histamine H, receptor 
(HRH1), which we associate with a sedation ADR (EF = 4.9). By 
contrast, none of the known targets of prenylamine was associated 
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Table 2 | Characteristic new, confirmed targets associated with ADRs of the drugs 


Drug name Target Activity (uM) AUC (uM h) Crax (uM) Adverse event EF ratio Alternative target Comparable drug 
(median) 

Chlorotrianisene COX-1 0.16 NA NA Abdominal pain 2.32 one None 

upper 

Rash 1.79 one None 
Clemastine SLC6A4 0.42 NA NA Sleep disorder 2.15 one None 
Cyclobenzaprine HRH1 0.02 0.16-4.10 (0.69) 0.01-0.13 (0.06) = Ataxia L773 one Desipramine 

Somnolence 1.49 one Aripiprazole 
Diphenhydramine SLC6A3 4.33 2.57-3.42 (3.00) 0.26-0.26 (0.26) = Tremor 2.02/1.90 SCN10A Citalopram 
Loxapine CHRM2 1.12 0.03-0.43 (0.21) 0.02-0.41 (0.14) — Tachycardia 2.08/1.97  CHRM1 Sibutramine 
Methylprednisolone PGR 1.30 0.09-10.76 (1.28) 0.06-2.11 (0.31) Depression 3.87/2.49 R3C1 Flutamide 
Prenylamine HRH1 7.87 0.12-0.12 (0.12) 1.20-1.20 (1.20) Sedation 4.94 one None 
Ranitidine CHRM2 5.56 5.66-121.90 (9.67) 1.14-9.11 (2.12) Constipation 1.63 one Haloperidol 
Ritodrine OPRM1 9.18 0.03-0.32 (0.11) 0.01-0.15 (0.04) Hyperhidrosis 3.21 one Oxycodone 
ADRs are listed that are more strongly associated with the predicted target than any known target, together with pharmacokinetic data (AUC and C,,,x), where available. Numbers in parentheses denote the median 


value, and the range denotes the minimal and maximal reported value. Where pharmacokinetics (PK) and pharmacodynamic activity (PD) were available, drugs have been identified that behave comparably to the 
predicted drug and also cause the adverse event. The EF ratio is the ADR-target enrichment for the predicted/best alternative known target ratio. Comparable drugs are those that are known to bind the predicted 
target (bold denotes predicted target is the primary target), share the ADR and behave similarly in terms of PK and PD (see Methods). NA, not available. 


with this side effect. For other cases, known targets represented an 
alternative explanation for an ADR. For instance, we found that 
diphenhydramine binds to the dopamine transporter (SLC6A3; 
Table 2), which is associated with tremor**. Although tremor was also 
associated with one of the known targets of diphenhydramine, sodium 
channel SCN10A (EF = 1.9)”, its association with the dopamine trans- 
porter was higher (EF = 2.02), indicating a possible mechanistic link 
with the new off-target. Conversely, the “dry mouth’ side effect of 
diphenhydramine was better explained by its known antagonism of 
the M3 muscarinic receptor (CHRM3; EF = 2.45, Supplementary 
Table 5). 

We asked whether the affinity of a drug for its predicted ADR- 
target was relevant given its pharmacology, comparing the predic- 
tions against other drugs with similar pharmacodynamics and 
pharmacokinetics (Table 2). This was possible for 36 drug-target- 
ADR links (Supplementary Table 6). For instance, cyclobenzaprine 
was shown to bind to the histamine H, receptor at 21 nM, whereas 
its median maximal plasma concentration (Cyax) was 61 nM; nine 
other drugs binding the H, receptor in the nanomolar range with 
comparable C,,,x values were found (Table 2). Although some of the 
measured drug-target affinities were moderate, the pharmacokinetic 
data often confirmed that they were nevertheless relevant. For instance, 
the affinity of ranitidine (Zantac) for the M, muscarinic receptor, 
which we associate with its constipation ADR, is only 5.6 uM. 
Nevertheless, with an area under the curve (AUC) value of 5.7 to 
122 uM h (minimal and maximal reported values) in plasma, this asso- 
ciation seems plausible. Similarly, diphenhydramine has an ICs9 value 
of only 4.3 uM against the dopamine transporter, a target that we 
associate with the tremor side effect of the drug. Nevertheless, the 
AUC value of diphenhydramine of 2.6 to 3.4 uMh supports the rel- 
evance of its modest ICs, value. 


Drug-target-ADR networks 


Network graphs help to visualize the new and known drug-target 
links, and the adverse events with which they are associated 
(Fig. 2a—-c). For example, the oestrogen receptor (ESR1) modulator 
chlorotrianisene was found to inhibit COX-1, with an affinity sub- 
stantially better than its affinity for ESR1. Drugs that modulate the 
two proteins can share two of the adverse reactions of chlorotrianisene 
(‘erythema multiforme’ and ‘oedema’), but ‘rash’ and ‘abdominal pain 
upper’ link only to drugs inhibiting COX-1, and these are both asso- 
ciated with chlorotrianisene almost uniquely among the oestrogen 
receptor modulators (Fig. 2c and Supplementary Table 5). For 
prenylamine, a new G-protein-coupled receptor (GPCR) cluster (HRH1, 
OPRM1 and ADRB2) emerges that is unrelated to the primary ion 
channel activity of the drug, but uniquely link to its sedative and 
myocardial infarction ADRs (Fig. 2b). For domperidone, its known 
activity at dopamine receptors is associated with a Parkinsonism-like 
phenotype (‘hyperprolactinaemia’ and ‘extrapyramidal disorder’), 
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whereas ‘somnolence’ associates only with the newly discovered opioid 
activity (Fig. 2a). 

Several of the drug-target-ADR associations that emerged were 
surprising. Among them were the association of the muscle relaxant 
cyclobenzaprine with somnolence, the H; antagonist ranitidine with 
constipation, and chlorotrianisene with upper abdominal pain 
(Table 2). Cyclobenzaprine caught our attention because even its 
mechanism of action target for muscle relaxation has not been 
characterized, and its association with the off-target discovered here, 
the H, receptor (ICs59 = 20nM), precedes the identification of its 
primary target. The central nervous system Hy receptor is strongly 
associated with somnolence, consistent with the ADR of the drug, and 
supported by its pharmacokinetics. Similarly, the constipation effect 
of ranitidine is consistent with its activity on the M, muscarinic 
receptor. Although its affinity for M, is moderate at 5.5 1M, the 
pharmacokinetics make this affinity relevant to this ADR. 

Perhaps the most compelling demonstration of a drug-target- 
ADR association is one in vivo, or in an accepted in vivo biomarker. 
The observation that chlorotrianisene was a potent COX-1 inhibitor 
seemed a reasonable explanation for the upper abdominal pain 
(epigastralgia) side effect provoked by the drug, and one that lent itself 
to direct testing in an accepted biomarker. Epigastralgia is a well- 
known ADR of non-steroidal anti-inflammatory drugs (NSAIDs), 
which inhibit the cyclooxygenase enzymes COX-1 and COX-2. 
COX-1 has housekeeping effects in the gastric mucosa*’, and its 
inhibition can lead to mucosal thinning and gastroduodenal ulcera- 
tion, leading to upper gastric pain and the thousands of annual 
hospitalizations that are associated with NSAID use’. NSAIDs also 
inhibit platelet aggregation by direct inhibition of their endogenous 
COX-1 enzyme”. Intriguingly, this effect is unreported for other 
synthetic oestrogens, which, to the contrary, are more likely to pro- 
mote platelet aggregation®’*’. A widely accepted model for platelet 
aggregation may be run ex vivo, in whole blood, allowing one to test 
for target engagement of COX-1 in this effect. 

Accordingly, collagen-induced platelet aggregation was measured in 
freshly drawn human blood from six healthy volunteer donors. 
Acetylsalicylic acid, the active ingredient in aspirin, inhibited platelet 
aggregation by 42-48% at 250,1M. The more potent NSAID 
indomethacin inhibited platelet aggregation by 50% at 50M. 
Chlorotrianisene inhibited platelet aggregation in whole blood with a 
potency almost indistinguishable from that of indomethacin, and more 
potently than acetylsalicylic acid (Fig. 2d). These results are consistent 
with an in vivo inhibitory activity of chlorotrianisene on COX-1, and 
with the epigastralgia that is among its common side effects. 


Drug and target promiscuity 

To investigate overall patterns of drug and target promiscuity, we inte- 
grated the experimental results from this and other Novartis studies. 
The most promiscuous target was the voltage-gated sodium channel 
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Figure 2 | Off-target networks. a-c, Off-target networks for three drugs. 
Known targets of the drugs are grey, whereas newly predicted targets are blue; 
the adverse events associated with each are orange and red, respectively. Red 
adverse events are significantly (EF > 1, q value < 0.05) associated with the new 
off-targets. Targets related by sequence are connected by grey edges. 


(SCN5A), to which 70 out of 126 (56%) tested drugs bound (Fig. 3a). 
From a target family standpoint, however, this was an exception, as 
most other promiscuous targets were small molecule-recognizing 
GPCRs; of non-GPCRs, only SCN5A and the ion channel hERG were 
targeted more than average (>13% of drugs tested). Transporters had 
mid-range promiscuity; enzymes, nuclear receptors and ligand-gated 
ion channels were less promiscuous, whereas peptide-recognizing 
receptors were hit least of all (Supplementary Tables 2 and 7). 

Inverting this analysis, the most promiscuous drug, chlorhexidine, 
hit 34 out of 54 (64%) targets against which it was tested, and another 
nine drugs were active on more than 50% of their tested targets 
(Fig. 3b and Supplementary Table 8). Twenty-five drugs bound to 
proteins from among all major target classes. Highly promiscuous 
drugs were often lipophilic and cationic at physiological pH 
(Fig. 3b)**. 


Predicting off-targets and adverse events 


This study begins to quantify drug polypharmacology at scale: the 656 
drugs considered here each modulated an average of seven safety 
targets, sometimes across several classes, and more than 10% of the 
drugs acted on nearly half (45%) of the 73 targets (Fig. 3b). It is 
sobering that this promiscuity is observed for approved human drugs, 
which have typically already been optimized to minimize toxicity. For 
lead molecules that are progressing towards the clinic, this level of off- 
target promiscuity might be higher still’*. Anticipating these off- 
targets is difficult, as they can be unrelated in sequence and structure 
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d, Chlorotrianisene inhibits platelet aggregation. Two independent 
experiments (red and blue) are shown for chlorotrianisene and indomethacin, 
respectively. ASA, acetylsalicylic acid (positive control); vehicle denotes 
negative control. *P < 0.05, **P < 0.01 (paired Student’s t-test differences 
compared with vehicle control). Error bars denote s.d. 


to the primary targets of a drug, and even known target-ADR asso- 
ciations are not always straightforward. Two results of this study 
begin to address these challenges. First, of the 1,042 predicted 
drug-target associations that were tested, 48% were confirmed 
(Fig. 1a). With 46% of the predictions disproved, the method remains 
imperfect, but this rate may nevertheless be high enough to prioritize 
compound classes and targets for testing. Second, a guilt-by- 
association metric can link off-targets with ADRs. A three-way 
association between drugs, molecular targets and ADRs may be 
systematically calculated and interpreted (Fig. 2a-c). 

Surprisingly, drugs often modulated off-targets unrelated to their 
primary target. Of the 151 off-targets that were confirmed by new 
experiment, 39 were unrelated by sequence to any of the known drug 
targets (Fig. 1d). For example, the antitussive clemastine and the 
antihistamine diphenhydramine (an active ingredient in products 
such as Tylenol PM), both of which act on the histamine H, GPCR, 
also modulate the serotonin transporter (5-HTT, also known as 
SLC6A4), to which the primary target is unrelated by sequence or 
structure. Conversely, the serotonin transporter inhibitor sertraline 
acts on the histamine H, GPCR. The activity of drugs on targets that 
are unrelated by sequence or structure to their primary targets can 
seem capricious and certainly makes prioritization of likely targets 
more difficult. A ligand-based approach offers an orthogonal view of 
target relationships and so can illuminate similarities that are opaque 
from a molecular biology perspective. The converse is also true, and 
the two views will often be complementary. 
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Figure 3 | Target and drug promiscuity. 

a, Target promiscuity. Targets are sorted based on 
the percentage of drugs hitting the target below 
30 uM (indicated by numbers next to target 
names). Colours code for seven distinct target 
classes. b, Promiscuous drugs are often 
hydrophobic and cationic. Each point represents 
one drug. Ionization at pH 7.4 and AlogP 
(calculated octanol—water partitioning coefficient) 
values were calculated from drug structures. Hit 
rate denotes the percentage of targets the drug 
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The association between chlorotrianisene, COX-1 and epigastralgia 
illustrates the potential of the approach. The therapeutic target of 
chlorotrianisene, the oestrogen nuclear hormone receptor, bears no 
sequence or structural similarity to the COX-1 enzyme, but the likelihood 
of cross-activity between the two targets is articulated by ligand 
similarity (Table 1). Correspondingly, the linking of abdominal pain 
and COX-1 only emerges when one quantitatively compares the ADRs 
of known COX-1 inhibitors to what one would expect at random. The 
potent inhibition of platelet aggregation by chlorotrianisene in whole 
blood (Fig. 2d) is consistent with the systemic, in vivo activity of this 
drug at relevant concentrations on COX-1. 

Certain shortcomings of the method should not escape the reader’s 
attention. Almost 46% of the predicted drug-target associations were 
disproved, and the method, which is inference-based, undoubtedly 
has important false negatives. As such, SEA cannot replace 
compound-target testing. What it can do is identify compounds early 
in development for possible liabilities that would ordinarily be iden- 
tified only much later in drug progression. Similarly, the guilt-by- 
association method is inference-based and mechanism-naive, and 
so will miss some target-ADR associations and make others that 
are invalid. Also, only some side effects fall into the remit of this 
approach, which assumes an off-target mechanism. Side effects, like 
other pharmacological events, have a strong exposure component, 
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and can result from complex interactions with regulatory networks*’. 
Thus, topical drugs such as econazole or chlorhexidine, although pro- 
miscuous in vitro, have fewer ADRs than expected because they never 
achieve sufficient systemic exposure in vivo. Conversely, a drug such as 
tacrolimus might be relatively selective in vitro, but is associated with 
several side effects owing to its broad immunosuppressant effects. 

These caveats should not obscure the potential of this approach to 
predict and understand drug side effects. The method was used 
automatically at scale, without human intervention. Whereas its pre- 
dictions were sometimes disproved, they were just as often confirmed. 
If some of the targets it suggested required no imaginative leap—as 
when a steroid was predicted for a new nuclear hormone receptor—a 
quarter of the confirmed targets were unrelated by sequence or struc- 
ture to any of the known targets of the drugs (Fig. 1c, d, Table 1 and 
Supplementary Table 2). Pragmatically, the ability to calculate drug- 
target-ADR networks provides a tool to anticipate liabilities among 
candidate drugs being advanced towards the clinic, or yet earlier, for 
prioritization of chemotypes in preclinical series. If such networks 
cannot replace direct experimentation, they can usefully prioritize 
off-targets for consideration. As we struggle to develop new therapeutics, 
this and related approaches''!******** can identify molecules with 
liabilities early in their development, and so focus effort on those 
candidates that are least subject to them. 
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METHODS SUMMARY 


A collection of on-hand 656 US Food and Drug Administration-approved drugs 
was computationally screened against a panel of 339 molecular targets represent- 
ing the species-specific expansion of 73 target assays used in Novartis safety 
panels. Each of the 339 target proteins was represented by its set of known ligands, 
as extracted from the ChEMBL_2 database. The two-dimensional structural 
similarity of a drug to the ligand set of a target was quantified as an E value using 
the SEA, then subjected to a molecular charge filter. Predictions were tested 
retrospectively using proprietary databases including GeneGo Metabase, 
Thompson Reuters Integrity, Drugbank and GVKBio; new predictions were 
tested prospectively in Novartis in vitro assays. Binding assays, and, when 
available, functional assays were performed including scintillation proximity, 
fluorometric imaging, filtration, fluorescence polarization, patch clamp, time- 
resolved fluorescence resonance energy transfer, and homogenous time-resolved 
fluorescence assays. Concentration-response curves were calculated using XLfit 
(v.2 or v.4, IDBS) or corresponding in-house software. All curves were redrawn 
using GraphPad PRISM v.5. Adverse drug reaction data were extracted from the 
World Drug Index and encoded using the medicinal dictionary for regulatory 
affairs (MedDRA). Using target annotations from GeneGo Metabase, Integrity, 
Drugbank, ChEMBL and GVKBio, target-ADR pairs for all drugs were enumerated. 
Disproportionality analysis in conjunction with a chi-squared test for association 
was carried out for all drug-target pairs. The false discovery rate was controlled 
using the Benjamini-Hochberg correction for multiple hypothesis testing. 
Pharmacokinetic data were extracted from Integrity. For target and drug 
promiscuity analysis, combined external and internal target annotations were used. 
The computational workflow apart from SEA was implemented in Pipeline Pilot 
version 8 and statistical analyses were performed in R. Platelet aggregometry 
was performed in human blood for chlorotrianisene and indomethacin, with 
acetylsalicylic acid as a positive control. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Virtual target profiling of drugs. We assembled a set of 656 drugs (Supplementary 
Table 1) available for internal prospective testing together with 73 assay targets for 
which Novartis safety panel assays”* were available. To compare activity annotations 
across databases, each target was mapped to human genes using Entrez gene and 
ChEMBL target identifiers (Supplementary Table 2). For target prediction, the 73 
targets were represented by 339 orthologous proteins from human, rat, mouse, 
bovine and sheep, using the ChEMBL_2 database (released 25 March 2010); ligands 
for these targets with affinities =1 |1M were grouped into sets for the SEA calculation. 

We computationally screened the 656 drugs against the 339-target panel, using 
1024-bit folded ECFP_4 (ref. 46) and 2048-bit Daylight” fingerprints indepen- 
dently, with the Tc value as the similarity metric. Tc values lie between 0 and 1, in 
which 1 corresponds to perfect overlap of two fingerprints. Where both fingerprints 
yielded the same SEA prediction, we took the prediction with the lower (that is, 
stronger) E value, unless otherwise noted. The maximum pair-wise Tc value was 
used in the INN model. 

Predictions with E< 10~* were retained. As a final step, we subjected the SEA 
predictions to a pass/no-pass charge filter, to de-prioritize those predictions in which 
the total charge of the drug did not match the charges calculated for at least 5% of the 
known ligands of the predicted target*’*’. This resulted in 4,195 drug-ChEMBL 
target pairs that were subsequently mapped to the 73 target panel, resulting in 
1,644 unique predictions (the difference reflects the orthologous redundancies). 
Testing predictions. Many SEA predictions could be confirmed by interrogation of 
proprietary databases, available at Novartis but unavailable in San Francisco where 
the calculations were performed. These included the Thompson Reuters Integrity 
(http://thomsonreuters.com/products_services/science/science_products/a-z/ 
integrity, accessed January 2011), GeneGo Metabase (version 6.2, http://www. 
genego.com, accessed January 2011), and GVKBio (http://www.gvkbio.com/, 
accessed January 2011). In addition, we also compared predictions with the 
ChEMBL_11 (ref. 29) and Drugbank 3.0 (ref. 48) databases. For comparison across 
data sources, compounds were represented using the non-stereo-specific part of 
InChiKeys”. 

For prospective evaluation of the remaining predictions we used binding and 

functional assay data from internal Novartis profiling efforts, carried out in parallel 
to the SEA study. For some targets, functional assays were also available. Full 
concentration-responses curves were plotted for any compound with at least 
50% inhibition or activity at the maximal tested concentration (301M; Sup- 
plementary Fig. 1). For detailed assay descriptions, see Supplementary Methods 
and Supplementary Table 2. 
Comparison to a INN model. We evaluated two INN models, using either 
ECFP_4 or Daylight fingerprints. Each drug was compared with all reference 
ligands of a target. The highest Tc value resulting from that comparison was 
assigned to the drug-target pair. For each drug, we identified the lowest Tc value 
that yielded valid SEA predictions using the respective fingerprint, and collected 
all drug-target pairs with Tc scores above that threshold, irrespective of the SEA E 
value. We counted the predictions confirmed in the proprietary databases or by 
experiment at Novartis. We calculated an adjusted hit rate: 


Adjusted hit rate = (number of true positives + 1)/ 
(number of total predictions + 1). 


The additional count for both numerator and denominator distinguishes cases in 
which no predictions were confirmed, but one method or the other predicted 
fewer targets. For example, SEA predicted four targets for bezafibrate, none of 
which were confirmed (Supplementary Table 4). However, at the corresponding 
Tc threshold of 0.37, the ECFP_4 INN model identified 12 potential targets, none 
of which was confirmed. The adjusted fraction for SEA is 0.2 ((0+1)/(4+1)), 
whereas the adjusted fraction for the INN model is 0.077 (1/13). We monitored 
he average adjusted hit rate for ten similarity threshold bins ranging from 0 to 1. 
BLAST target comparison. To investigate how closely the predicted targets were 
related to already known primary or off-targets, we calculated a target similarity 
matrix for all known and predicted targets found in our study. Amino acid sequences 
ofall targets were assembled from UniProt”. Sequences were compared in a pair-wise 
manner using BLASTp as implemented in Pipeline Pilot (version 8, http://www. 
accelrys.com)*'. Target sequence similarity was quantified using BLAST E values. 
Target pairs with values smaller than 10° ° were considered related by sequence. 
Target and drug promiscuity. Targets were classified using the ChEMBL target 
taxonomy, which consists of eight levels. The first three levels were used here to 
distinguish between small molecule and peptide GPCRs, as well as voltage- and 
ligand-gated ion channels (Supplementary Table 5). In-house and literature 
drug-target annotations were combined, and annotations with ICs) < 30 uM 
were counted as hits. The lipophilicity of drugs was assessed by calculating 
AlogP values in Pipeline Pilot. Negative values correspond to hydrophilic com- 
pounds, and positive values to lipophilic compounds. 


ct 


Associations between targets and ADRs. ADRs were extracted from the World 
Drug Index (WDI, http://thomsonreuters.com/products_services/science/scien- 
ce_products/a-z/world_drug_index/, accessed March 2011) and mapped to pre- 
ferred terms from the medicinal dictionary for regulatory affairs (MedDRA)”. 
MedDRA organizes adverse reaction terms in a hierarchy reaching from low-level 
terms to system organ classes at the highest level. Original WDI terms were first 
mapped to low-level terms in the MedDRA hierarchy using text mining compo- 
nents in Pipeline Pilot (version 8). Low-level terms serve as synonyms for pre- 
ferred terms in MedDRA. These preferred terms were used to identify each 
adverse event uniquely. For example, the low-level terms ‘dry mouth’ and ‘xer- 
ostomia’ both map to the preferred term dry mouth. This resulted in 1,685 unique 
ADR terms, 2,760 unique drug structures with ADR annotations, and a total of 
51,101 drug-ADR pairs. Using drug-target associations from databases used for 
testing predictions, we enumerated all target-ADR pairs (681,797 total). The 
assessment was done separately for binding, antagonist and agonist annotations. 
Assuming that each ADR could potentially occur owing to any of the targets hit 
by the drug, we enumerated all possible target-ADR pairs for each drug. Target- 
ADR pairs occurring more than ten times were retained. The number of observa- 
tions for each unique pair was then compared with the expected number of 
observations given the overall distribution of activity and adverse effect annota- 
tions. An enrichment score was calculated for each target-ADR pair: 


EF = p/(A X T/P) 


in which p is the co-occurrence of target X and ADR Y, A is the number of times 
ADRY was linked to any drug-target pair, T is the number of times target X was 
linked with any drug-ADR pair, and P is the total number of target-ADR pairs. 
To assess the statistical significance of found associations, we applied the chi- 
squared test for association based on contingency tables calculated for each 
unique target-ADR pair with an EF score greater than one. The false discovery 
rate was controlled using Benjamini-Hochberg correction in R (version 2.12, 
http://www.r-project.org)*. P values and q values (that is, P values corrected 
for multiple hypothesis testing), as well as the 7’ statistic were calculated using 
the R statistical package. A total of 3,257 associations with a q value of <0.05 were 
retained (Supplementary Table 5). 
Adverse reactions associated with predicted targets. Enrichment factors of 
predicted target-ADR pairs were compared with the association of ADRs with 
any known targets of each drug. We prioritized adverse reactions that were stron- 
ger associated with the predicted than with any known target (that is, had a higher 
EF score). To prioritize further adverse reactions that were probably due to the 
newly predicted target we extracted pharmacokinetic data from Thompson 
Reuters Integrity. Maximal plasma concentration (C;ax) and cumulative concen- 
tration (AUC) values measured in humans were assembled. Activity data were 
assembled from quantitative sources (ChEMBL_11 and GVKBio) for drugs that 
were not part of the predictions, but shared ADRs with the prediction drugs. Drugs 
were identified for each prediction and associated ADR that satisfied the following 
three criteria: (1) they shared the ADR with the prediction drug; (2) they were not 
more than ten times more active at the predicted target; and (3) their Cy. value 
and/or AUC value was not more than ten times higher than for the prediction drug. 
Platelet aggregation inhibition. Human blood samples from six healthy volunteer 
male donors were used to perform platelet aggregometry with a multiplate imped- 
ance aggregometer (Dynabyte Medical) as follows: chlorotrianisene or indomethacin 
was added to whole blood at final concentrations of 0.5, 5 and 50 11M, and incubated 
at room temperature for 10 min; platelet aggregation was induced with 1 pgml* 
collagen and measured at 37 °C for 15 min; control aggregations were measured with 
vehicle only, and with 250 tM acetylsalicylic acid. Statistical analysis was performed 
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Structure of yeast Argonaute with 


guide RNA 


Kotaro Nakanishi'*, David E. Weinberg?**, David P. Bartel?’ & Dinshaw J. Patel! 


The RNA-induced silencing complex, comprising Argonaute and guide RNA, mediates RNA interference. Here we report the 
3.2A crystal structure of Kluyveromyces polysporus Argonaute (KpAGO) fortuitously complexed with guide RNA originating 
from small-RNA duplexes autonomously loaded and processed by recombinant KpAGO. Despite their diverse sequences, 
guide-RNA nucleotides 1-8 are positioned similarly, with sequence-independent contacts to bases, phosphates and 
2'-hydroxyl groups pre-organizing the backbone of nucleotides 2-8 in a near-A-form conformation. Compared with 
prokaryotic Argonautes, KpAGO has numerous surface-exposed insertion segments, with a cluster of conserved insertions 
repositioning the N domain to enable full propagation of guide-target pairing. Compared with Argonautes in inactive 
conformations, KpAGO has a hydrogen-bond network that stabilizes an expanded and repositioned loop, which inserts an 
invariant glutamate into the catalytic pocket. Mutation analyses and analogies to ribonuclease H indicate that insertion of this 
glutamate finger completes a universally conserved catalytic tetrad, thereby activating Argonaute for RNA cleavage. 


RNA interference (RNAi) is a eukaryote-specific gene-silencing path- 
way triggered by double-stranded RNA (dsRNA)’”. In this pathway, 
the ribonuclease (RNase) III enzyme Dicer first cleaves the dsRNA 
trigger into small interfering RNAs (siRNAs), which have 5’-mono- 
phosphates and pair to each other with two-nucleotide 3’ over- 
hangs**®. The siRNA duplex is incorporated into the effector 
protein Argonaute (AGO), at which point one of the strands (desig- 
nated the passenger strand) is cleaved’. After the cleaved passenger 
strand is discarded, the resulting ribonucleoprotein complex (the 
RNA-induced silencing complex, or RISC) uses the remaining 
siRNA strand (designated the guide strand) to specify interactions 
with target RNAs’*"'. If sequence complementarity between guide 
and target is extensive, AGO again catalyses cleavage, resulting in 
‘slicing’ of the target RNA”. 

The first structures of full-length AGOs were of prokaryotic proteins 
from Pyrococcus furiosus’* and Aquifex aeolicus’*. Early structures 
revealed that the PIWI domain adopts an RNase H-like fold, thereby 
implicating AGO as the ‘slicer’ enzyme that mediates RNAi‘*’*”>. 
Because these prokaryotic enzymes bind 5'-phosphorylated guide 
DNAs rather than RNAs’*"*, subsequent structures featured the binary 
complex of Thermus thermophilus Ago (TtAGO) with guide DNA” 
and ternary complexes with target RNAs of varying length'*””. These 
studies shed light on the nucleation, propagation and cleavage steps 
of the AGO catalytic cycle’’”°. However, the physiological role of 
prokaryotic AGOs is enigmatic; the origin of the guide DNA is 
unknown, and bacteria lack recognizable components of the RNAi 
pathway”. Therefore, attention has turned to eukaryotic AGOs, which 
use RNA guides and have protein-binding partners absent in bac- 
teria’. Eukaryotic AGOs are also larger than prokaryotic AGOs 
because of insertion elements of unknown structure and function. 
Structures of individual domains and the MID-PIWI lobe within 
eukaryotic AGO have been determined**”*, but structural character- 
ization of the entire protein has remained a challenge. 

Although Saccharomyces cerevisiae lacks RNAi, some closely related 
budding-yeast species were recently shown to have retained RNAi, 


thereby offering fresh possibilities for the study of the eukaryotic path- 
way’. We previously determined the structure and mechanism of 
Dicer from the budding yeast Kluyveromyces polysporus*® and thus 
turned our attention to the AGO of this species. 


Cleavage activity of budding-yeast AGO 

K. polysporus AGO (Agol1) has the four conserved domains (N, PAZ, 
MID, PIWI) and two linker regions (L1, L2) found in other AGOs 
(Fig. 1a). It also has an amino-terminal extension, predicted to be 
disordered, which we removed to facilitate crystallization. The result- 
ing protein, KpAGO, can substitute for the full-length protein when 
reconstituting RNAi in S. cerevisiae (Fig. 1b). 

KpAGO and other budding-yeast AGOs have acidic side chains at 
the three positions corresponding to active-site residues in slicing- 
competent AGOs*’ (Supplementary Fig. 1), which suggested that 
KpAGO might also cleave target RNAs. Indeed, after incubation with 
a single-stranded guide RNA, recombinant KpAGO cleaved a matched 
target RNA at the expected position (Fig. 1c). To examine whether 
slicing occurs in vivo, we performed degradome sequencing from a 
related RNAi-containing yeast, Saccharomyces castellii. This procedure 
identifies polyadenylated RNAs containing 5’-monophosphates, 
including products of AGO-catalysed slicing’***. Many AGO1-depend- 
ent degradome tags mapped to Y’-element transcripts (major targets of 
S. castellii RNAi’’), and these tended to pair to endogenous siRNAs in 
the register that implicated cleavage across from positions 10-11 of the 
guide RNA, which was diagnostic of slicing** (Supplementary Fig. 2). 
These results indicate that budding-yeast AGO functions as a slicer 
during endogenous RNAi, and with the in vitro results establish 
KpAGO as a eukaryotic slicer suitable for structure—function analyses. 


Structural architecture of eukaryotic AGO 

We crystallized recombinant KpAGO purified from Escherichia coli. 
Extensive screening identified several crystals that were free of 
twinning, one of which diffracted to 3.2 A resolution. A crystal of 
selenomethionine-substituted KpAGO yielded reflections suitable 
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Figure 1 | Cleavage activity of budding-yeast AGO. a, Domain architectures 
of AGO proteins from T. thermophilus, Homo sapiens, Arabidopsis thaliana, 
Schizosaccharomyces pombe and K. polysporus. b, RNAi reconstituted in S. 
cerevisiae using K. polysporus genes. Median green fluorescent protein (GFP) 
intensity is plotted as a fraction of GFP-only control. Error bars, quartiles; 
dashed line, background fluorescence. c, Cleavage activity of KpAGO. RNAs 
labelled at a cap phosphate (red) and matching the guide (either perfectly or 
with mismatches to positions 10-11) were incubated with/without (+/—) 
KpAGO that had been pre-incubated in the presence/absence (+/—) of 
synthetic guide RNA. Product was resolved on a denaturing polyacrylamide gel 
alongside cap-labelled synthetic product (left) and RNA standards (migration 
shown on right; nt, nucleotide). 


for phasing by single-wavelength anomalous dispersion (Supplemen- 
tary Table 1; representative electron density is shown in Supplemen- 
tary Fig. 3). 
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The overall structure of KpAGO resembles the bilobal architecture 
of its prokaryotic counterparts, but with expansions throughout the 
protein (Fig. 2a and Supplementary Fig. 4). Of the 19 insertion 
segments not found in prokaryotic AGOs'*"*””, 11 were conserved 
segments (cS) found in all eukaryotic AGOs, albeit with some differ- 
ences in secondary structure and length, whereas the remaining eight 
were variable segments (vS) found in only some eukaryotic AGOs 
(Supplementary Fig. 1). All insertion segments are external, generating 
new surfaces for potential interactions with AGO-binding proteins. 


Autonomously loaded guide RNA 

After modelling the KpAGO protein, the F, — F, map revealed con- 
tinuous residual electron density lying along the nucleic-acid-binding 
channel (Fig. 2b). This unanticipated density resembled that of an 
oligonucleotide and could be fit well with an RNA octamer (Fig. 2c 
and Supplementary Fig. 5a). Analysis of end-labelled polynucleotides 
extracted from soluble and crystalline KpAGO confirmed the pres- 
ence of small RNAs (Fig. 2d and Supplementary Fig. 5b), the high- 
throughput sequencing of which identified a diverse population with 
a bimodal length distribution centring at 12 and 17 nucleotides 
(Supplementary Table 2 and Supplementary Fig. 5c, d). 

The location of the copurifying RNAs suggested that they might 
represent functional guide RNAs. Supporting this interpretation, they 
had two features of budding-yeast guide RNAs: 5’ uridine enrichment 
(Fig. 2e) and the presence of 5’ monophosphate, indicated by both 
electron density (Fig. 2c) and a phosphatase-sensitive block of 5’-end 
labelling (Supplementary Fig. 5e)””. Moreover, the KpAGO prepara- 
tion sliced an RNA containing a site complementary to a copurifying 
17-nucleotide RNA comprising approximately 0.1% of our sequen- 
cing reads (Fig. 2f and Supplementary Fig. 5c). Slicing was at the 
anticipated linkage and sensitive to mismatches to guide nucleotides 
10-11. Reactions showed initial burst kinetics, as observed previously 
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Figure 2 | KpAGO architecture and copurifying RNA. a, KpAGO protein 
structure, with N (cyan), linker L1 (yellow), PAZ (violet), linker L2 (grey), MID 
(orange), and PIWI (green) domains in ribbon representation. Constant (cS) 
and variable (vS) insertion segments, blue and slate, respectively; disordered 
regions, dotted lines. b, F, — F, map (blue) contoured at 2.80 before modelling 
RNA. ¢, Simulated-annealing omit map (blue) contoured at 3.50 around final 
RNA model (red). d, Nuclease sensitivity of copurifying nucleic acid. End- 
labelled polynucleotides extracted from the indicated KpAGO samples were 
either untreated (—) or incubated with RNase (R) or DNase (D) before analysis 
ona denaturing gel. e, Nucleotide composition and origin of copurifying RNA. 


Oe a, 
S$ 151515 1 3 15151515 1 3 15 Time (min) 


100 nt - 
60 Nt - ee ee ee ee ee 
40 nt - 
30 nt - 


= she + KpAGO 


- 40 nt 


20 nt - 


10 nt - 


Sequences were analysed for enriched or depleted nucleotides (positive or 
negative bits, respectively) at each of the first eight positions (top). Numbers of 
sequencing reads mapping along each strand of the KpAGO expression 
plasmid are indicated (bottom, log scale). f, Cleavage activity guided by 
copurifying RNA. As in Fig. 1c, except labelled RNAs were designed to match 
the indicated copurifying RNA. g, Autonomous duplex loading and passenger- 
strand cleavage. Labelled single-stranded RNA (ssRNA) or siRNA duplex was 
incubated with or without KpAGO. h, Autonomous duplex loading and target 
cleavage. As in Fig. 1c, except KpAGO and RNA concentrations were reduced 
by 90% and 99.95%, respectively. 
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Figure 3 | Organization of the guide RNA. a, b, The 5’-nucleotide-binding 
pockets of KpAGO (a) and TtAGO” (b). Colours, as in Fig. 2a; protein, ribbon 
representation; highlighted residues and RNA, stick representation; O02’, 04’ 
and phosphate, white, cyan and yellow, respectively; hydrogen bonds, dotted 
lines. c, d, Interactions involving bases and either phosphates (c) or 2’-OH 
groups (d) of the seed region. Intermolecular (black) and intramolecular (blue) 
hydrogen bonds, dotted lines; hydrophobic interactions, van der Waals radii. 
e, Effects of guide-strand modifications on duplex loading and passenger- 


for metazoan AGOs*'***’, although addition of Triton enabled 
sustained product formation (Supplementary Fig. 5f), perhaps by 
facilitating a conformational change that promotes product release. 

Most copurifying RNAs mapped to the KpAGO expression 
plasmid (Fig. 2e and Supplementary Fig. 6) in a manner suggesting 
origins from siRNA-like duplexes loaded into KpAGO with sub- 
sequent passenger-strand cleavage (Supplementary Fig. 7). For this 
to occur, KpAGO must be able to load siRNA duplexes in the absence 
of RISC-loading factors. Indeed, purified KpAGO incubated with an 
siRNA duplex generated products diagnostic of passenger-strand 
cleavage (Fig. 2g) and formed active RISC able to slice cognate target 
RNA (Fig. 2h). Loading was more efficient with duplex than with 
single-stranded guide and occurred asymmetrically in a manner con- 
sistent with preference for 5’ uridine on the guide strand (Fig. 2h and 
Supplementary Fig. 8a-c). 

We conclude that KpAGO can autonomously load an siRNA 
duplex, lose the passenger strand and then slice targets. This conclu- 
sion counters the prevailing view that loading of siRNA duplexes to 
form functional RISC requires RISC-loading factors’’. We suspect 
that other AGOs might also autonomously load siRNA duplexes 
and that reports to the contrary resulted from assaying target-RNA 
slicing under conditions in which AGO retained inhibitory passenger- 
strand fragments. Autonomous loading explains how KpAGO RISC 
fortuitously formed in the absence of other RNAi proteins. In contrast 
to previous preparations of AGO complexes used for structural studies, 
the formation of KpAGO RISC through loading of a duplex resembles 
the physiological RISC-assembly pathway. From this perspective, the 
KpAGO structure reflects the natural state of eukaryotic RISC. 
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strand cleavage. KpAGO was incubated with siRNA duplexes with the 
indicated guide strands; p, 5’ monophosphate; upper case, 2’-OH; lower case 
bold, 2'-deoxy. The fraction of labelled passenger strand cleaved is plotted 
(average of three independent replicates; error bars, standard deviations; points 
connected by smooth curves). f, Superposition of guide-RNA nucleotides 2-8 
(red) on A-form RNA (cyan and blue). Dihedral angles (0) between guide-RNA 
bases and those of A-form RNA are in parentheses. g, Solvent-exposed seed 
nucleotides (red). KpAGO surface is rendered, domains coloured as in Fig. 2a. 


Organization of the guide RNA 

Electron density corresponding to the base of nucleotide 1 was smaller 
than that corresponding to most other positions (Supplementary Fig. 9), 
which agreed with our sequencing results showing that KpAGO-bound 
RNAs were diverse but enriched for a 5’ uridine (Fig. 2e). Therefore, we 
modelled the first nucleotide as uridine and the next seven as adenine 
(the generic nucleotide used to minimize bias during refinement”) and 
refined the final structure as a KpbAGO-pUAAAAAAAp binary com- 
plex (Supplementary Table 1 and Supplementary Fig. 10). 

The guide-strand nucleotides 2-8 run along the nucleic-acid- 
binding channel, from the MID domain to the L2 domain. These 
nucleotides, including their bases, have electron-density quality 
resembling that of the KpAGO protein, even though this density 
represents a composite of thousands of different RNAs. Thus, for this 
segment of the guide RNA, known as the seed region, diverse RNA 
sequences are all presented in essentially the same orientation. The 
electron density disappeared after the ninth nucleotide (Fig. 2b, c), 
even though most copurifying RNAs were longer than nine nucleotides 
(Supplementary Fig. 5d). This density loss indicates that guide-RNA 3’ 
halves are either disordered or adopt diverse sequence-specific con- 
formations. In addition, the PAZ domain is not well ordered, as 
observed in TtAGO complexes in which the PAZ domain has released 
the 3’ end of the guide’’, consistent with the idea that KpAGO holds 
the guide RNA without assistance from the PAZ domain. 

Like prokaryotic AGOs’*'7"?°, KpAGO recognizes the 5’ phosphate 
of the guide, the notable difference being that KpAGO uses the 
ammonium group of Lys 939 rather than a divalent cation (despite 
Mn°~ in the crystallization buffer) to neutralize the negative charge 
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resulting from the close juxtaposition of the C-terminal carboxylate and 
phosphates 1 and 3 (Fig. 3a, b). The inserted carboxy terminus is 
anchored by Lys 939 and Lys 943 (Fig. 3a), with mutation of either 
residue impeding guide-RNA binding in Drosophila AGOI (ref. 28). 
Another distinct facet involves Arg 1183, which hydrogen bonds with the 
C-terminal carboxylate and phosphate 4 (Fig. 3c). In the free Neurospora 
crassa QDE-2 (NcQDE-2) MID-PIWI structure’* the analogous arginine 
is in a disordered loop, suggesting that guide RNA recruits Arg 1183 to 
the 5’-phosphate-binding pocket. Notably, conservation of Lys 939 and 
Arg 1183 is restricted to eukaryotic AGOs (Supplementary Fig. 1). 

The Asn 897 main-chain amide interacts with the O2 carbonyl of 
the uridine at position 1 (Fig. 3a). Because analogous interactions with 
O2 of cytidine and N3 of purines would be isosteric, this hydrogen 
bond cannot explain the preference for a 5’ uridine. The preference 
might instead be attributed to the relatively weak stacking and 
pairing of uridine, which would facilitate the requisite flipping out 
of nucleotide 1 during siRNA loading’*”. 

KpAGO interacts with phosphates of the seed region primarily 
using contacts homologous to those observed in prokaryotic AGO 
complexes'*"'°*° (Fig. 3c). Structures of prokaryotic complexes, 
however, have not revealed intermolecular contacts to the guide- 
RNA 2'-OH groups. We find that KpAGO forms hydrogen bonds 
with most 2'-OH groups of the seed, using main-chain atoms at 
positions 2, 5 and 6 and hydroxyl groups of Thr 1186 and Tyr 681 
at positions 4 and 7, respectively (Fig. 3d). We also observe an intra- 
RNA hydrogen bond between the 2'-OH group at position 3 and 04’ 
at position 4, a type of interaction proposed to facilitate base-pair 
fluctuations in A-form RNA helices*". A second intra-RNA hydrogen 
bond involves the 2’-OH group at position 1 and a non-bridging 
oxygen of phosphate 2, as previously observed in Archaeoglobus 
fulgidus PIWI-siRNA complex structures'*”°. 

To examine the contributions of guide-strand 5’ phosphate and 
2'-OH groups, we monitored autonomous loading and passenger- 
strand cleavage of modified siRNA duplexes. Removing the 
monophosphate or substituting all guide-strand 2'-OH groups with 
2'-H (deoxy) greatly impaired activity (Fig. 3e and Supplementary 
Fig. 11), consistent with observations in transfected human cells’. 
To learn more about the 2’-OH groups contributing to this effect, we 
compared guide RNAs with deoxy substitutions at positions 1, 2-8, 
9-14, 15-21 and 22-23. Substitution of the 2'-OH group at position 1 
enhanced activity (perhaps by facilitating flipping out of nucleotide 1), 
whereas substitutions in all other regions impaired activity (Fig. 3e and 
Supplementary Fig. 11). Deoxy substitution at positions 2-8 impaired 
activity to a similar degree as at positions 22-23, which are presumably 
recognized by the PAZ domain”***. Thus, the 2'-OH groups within the 
seed region contribute to duplex loading or passenger-strand cleavage. 
Nonetheless, greater effects were observed at positions 9-14 and 
15-21, the understanding of which will require structural studies of 
additional states along the eukaryotic RISC-assembly pathway. 

Together, contacts to the phosphate and 2’-OH groups maintain 
the sugar-phosphate backbone of the single-stranded guide-RNA 
seed in a near-A-form conformation resembling that of the siRNA 
duplex (Fig. 3f). Maintaining this conformation pre-organizes the 
seed backbone for pairing to the target, as anticipated from studies 
of microRNA targeting’ and supported by structural and biophysical 
studies'®*°**, Also as anticipated, the bases of the seed nucleotides are 
stacked, with Watson-Crick faces (particularly those of nucleotides 
2-4) displayed to solvent and accessible to nucleate pairing to target 
RNA (Fig. 3g). 

The surprising feature of the guide-RNA conformation was the 
tilting of the bases away from the orientation required for helical 
pairing (Fig. 3f). KpAGO makes hydrophobic contacts with the bases 
at positions 2, 5 and 6 while anchoring the sugar-phosphate backbone 
(Fig. 3c, d). Base 2 packs against Tyr 932 (Fig. 3d), which is conserved 
as Tyr or Thr in eukaryotic AGOs (Supplementary Fig. 1) and thus 
might represent a conserved hydrophobic interaction that facilitates 
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the flipping out of nucleotide 1 by preventing its stacking on base 2. As 
observed in structures of prokaryotic AGO complexes'*"'*”°, base 2 is 
recognized at N3 (purines) or O2 (pyrimidines) by the side chain of 
Asn 935 (Fig. 3d), which is conserved throughout all AGOs. Bases 5 
and 6 are surrounded by a hydrophobic pocket comprising Ile 682, 
Ala686, Leu 1147 and Lys 1148 (Fig. 3d). Bases 3 and 4 make no 
contact with KpAGO but are nonetheless tilted because of continuous 
stacking of the seed bases (Fig. 3c, d). Untilting of the seed stack, 
which would accompany nucleation of target pairing at positions 
2-4, might disfavour contacts to Ile 682 and neighbouring residues, 
thereby facilitating repositioning of «16, a helix that would otherwise 
block full seed pairing. Such changes in base tilting and «16 might 
communicate the presence of target RNA. 


Potentially unobstructed guide-target pairing 

To compare the architectures of eukaryotic and prokaryotic AGOs, 
we structurally aligned each domain of KpAGO on its TtAGO 
counterpart. Except for the N domain, each of the domains super- 
imposed well (Supplementary Fig. 12). The structural difference 
between the N domains is attributed to cS1, cS3 and vS2 (Fig. 4a, b). 
cS1 and cS3 cluster together with cS7 and cS10 (Supplementary Fig. 13a) 
such that they bury a space observed in prokaryotic AGO structures and 
concomitantly lengthen the nucleic-acid-binding channel'*!*!7"” 
(Fig. 4c, d). These insertion segments interact with the L2-linker and 
PIWI domains through a hydrogen-bond network involving residues 
that are conserved throughout eukaryotic AGOs (Supplementary Figs 1 
and 13a), which indicates that an extended nucleic-acid-binding channel 
is a general feature of eukaryotic AGOs. 

In all crystallized conformations of TtAGO, the N domain blocks 
the channel and prevents propagation of guide-target pairing beyond 
position 16 (ref 19) (Fig. 4c and Supplementary Fig. 13c, d). In addi- 
tion to lengthening the nucleic-acid-binding channel, the cS1/3/10 
cluster positions the KpAGO N domain such that a slight widening 
of the channel would allow pairing to propagate to the 3’ end of the 
guide RNA (Fig. 4d and Supplementary Fig. 13b). The potential for 
unobstructed propagation of guide-target pairing is consistent with 
the prevalence of pairing throughout the 3’ region of plant small 
RNAs that guide target cleavage* and the contribution of such pairing 
to the stability of guide-target association in vitro’. 
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Figure 4 | An extended, potentially unobstructed nucleic-acid-binding 
channel in KpAGO. a, b, Position of N domain (cyan) relative to L1-linker 
domain (yellow) in TtAGO”’ (a) and KpAGO (b). Domains are oriented based 
on their N-terminal beta strands (dashed line connects strand termini). 
Colours, as in Fig. 2a. c, d, Channels of TtAGO ternary complex’? (c) and 
KpAGO with modelled A-form duplex (d). Protein surfaces are rendered, 
highlighting distances between the N and PAZ domains (parallel lines) and the 
c$1/3/10 cluster (blue), which fills a cavity (dashed circle). 
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Glu 1013 completes a catalytic tetrad 

When comparing the structures of KpAGO and the free NCQDE-2 
MID-PIWI lobe’’, we observed notable differences in loops L1 and L2 
(Supplementary Table 3). In KpAGO, loop L2 expands by partial 
unfolding of «25 (Fig. 5a) and packs into a cavity, such that the 
invariant Glu 1013 side chain inserts into the catalytic pocket, near 
the three Asp residues of the active site (Fig. 5b). This conformation is 
enabled by the movement of loop L1, which otherwise blocks access to 
the catalytic pocket (Fig. 5c). Opening of the loop L1 gate in KpAGO is 
accompanied by a conformational transition of cS11 and hydro- 
phobic packing between aliphatic side chains on loop L1 and cS11. 
Notably, deletion of cS11 from Drosophila AGO] inhibits guide-RNA 
binding and abolishes silencing activity”. 

The plugged-in conformation, in which the Glu 1013 finger 
is inserted into the catalytic pocket, is stabilized by an extensive 
hydrogen-bond network, with Glu1013 bridging His977 and 
Arg 1045, and loop L2 main-chain atoms interacting with Arg 1045 
and Glu 1060 (Fig. 5b). These four residues are conserved throughout 
eukaryotic AGOs (and even most of the PIWI clade; Supplemen- 
tary Fig. 1). Glu1013, Arg1045 and Glu 1060 are also conserved 
throughout prokaryotic AGOs, prompting a search for a similar 
plugged-in conformation in the available structures’**!7"*, We 
found both plugged-in and unplugged conformations of TtAGO, with 
striking parallels to the eukaryotic hydrogen-bond network and 
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Figure 5 | A plugged-in glutamate finger at the active site. a, Superposition 
of «25 and B34 of KpAGO (green) on counterparts of NCQDE-2 MID-PIWI 
lobe (grey), highlighting the extended loop L2 (dark green). b, Hydrogen-bond 
network stabilizing the plugged-in loop L2. Loop L2, dark green; otherwise, as 
in Fig. 3a. c, Closed (left) and open (right) configurations of the loop L1 gate 
(purple) in NCQDE-2 MID-PIWI lobe and KpAGO, respectively. cS11, blue; 
otherwise, as in panel b. d, Superposition of the region flanking loop L2 in the 
unplugged (grey) and plugged-in (green) conformations of TtAGO, depicted as 
in panel a. e, Hydrogen-bond network stabilizing the plugged-in loop L2 in 
TtAGO, depicted as in panel b. f, Closed and open configurations of the loop L1 
gate in the unplugged and plugged-in conformations of TtAGO, respectively, 
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correlated loop movements (Fig. 5a-f). The plugged-in conformation 
was observed only in complexes in which the PAZ domain released 
the 3’ end of the guide and TtAGO assumed its catalytically active 
state (Supplementary Table 3). In contrast, the inactive states—either 
those of the apo proteins or complexes in which the PAZ domain 
engages the guide 3’ end—resembled the unplugged conformation. 
These observations indicate that the plugged-in conformation of loop 
L2 is correlated with release of the 3’ end of the guide and formation of 
active RISC. Supporting the functional importance of the plugged-in 
conformation, mutation of any of the four residues of the KpAGO 
hydrogen-bond network impaired RNAi (Fig. 5g). 

The structures of ternary TtAGO complexes in the plugged-in 
conformation show the position of loop L2 in the context of guide 
and target strands. TtAGO loop L2 interacts with the guide DNA at 
positions 11-15 (Supplementary Table 3)'°. Moreover, the carboxyl 
group of the glutamate finger approaches both the 2’-OH of the 
nucleotide adjacent to the scissile phosphate and one of the two 
active-site divalent metal ions (Fig. 5h), which indicates that the 
glutamate finger might act as a catalytic residue. Indeed, simultaneous 
coordination of the analogous 2’-OH and metal ion is the role of 
Glu 109 in the ‘DEDD’ catalytic tetrad at the active site of Bacillus 
halodurans RNase H1 (ref. 46). Although the PIWI domain of AGO 
has an RNase H fold, only a conserved ‘DDX catalytic triad (where “X’ 
is generally Asp or His) had been recognized in AGOs with slicer 


“cS11 f 


Loop Ln) 


a29 


depicted as in panel c. g, RNAi reconstituted in S. cerevisiae using wild-type 
(WT) K. polysporus AGO] or genes with the indicated substitutions. Silencing 
was monitored under either permissive (induced hairpin, blue bars) or 
stringent (repressed hairpin, open bars) conditions. Q1052 and Y902, 
conserved residues insensitive and sensitive to substitution, respectively**, were 
included as controls. Dashed lines (blue and black), background fluorescence 
(permissive and stringent conditions, respectively); otherwise, as in Fig. 1b. 
h, Stereoview of KpAGO catalytic residues (green) superpositioned with 
catalytic residues, divalent cations, scissile phosphate and adjacent nucleoside 
in TtAGO (blue) and B. halodurans RNase H1 (BhRNase H1, yellow) ternary 
complexes. 
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activity*’**. On the basis of analogy to RNase H, a fourth catalytic 
residue had been suspected, but previous searches for this missing 
component had focused on the residues corresponding to Arg 1045 
and Glu 1060 (refs 12, 13, 31, 46 and 47), whose conservation and 
proximity to the catalytic pocket are now explained instead by their 
roles in stabilizing the plugged-in conformation (Fig. 5b). In support 
of the glutamate finger as the missing catalytic residue that helps to 
coordinate an active-site metal ion (either directly or through outer- 
sphere contacts), the putative DEDD catalytic tetrads in the plugged- 
in conformations of both TtAGO and KpAGO are nearly isosteric 
with the RNase H DEDD tetrad (Fig. 5h). Moreover, when we assayed 
RNAi, only mutation of Glu1013 abrogated RNAi to the extent 
observed for mutation of Asp 1046, a previously identified active-site 
residue (Fig. 5g). Thus, we propose that the glutamate finger consti- 
tutes the second residue of a universally conserved RNase-H-like 
DEDxX catalytic tetrad at the active site of slicing AGOs. 

Our new insights suggest the following model for AGO loading and 
catalysis. The apo protein in the unplugged conformation binds the 
siRNA duplex, in part using contacts between the two-nucleotide 
overhang of the guide strand and the PAZ domain. As the duplex 
loads and the 3’ end of the guide stand is released from the PAZ 
domain, the glutamate finger inserts into the active site, thereby com- 
pleting the DEDX catalytic tetrad to enable cleavage of the passenger 
strand. After discarding the passenger-strand fragments, the resulting 
RISC remains in a plugged-in conformation resembling that of the 
current structure and is competent to bind and cleave suitably paired 
target RNAs. 

While our manuscript was in review, a structure of human AGO2 
(HsAGO2; also known as EIF2C2) with RNA of unknown biochemical 
origin and function was reported’. The authors noted many contacts 
to the RNA 5’ monophosphate and sugar-phosphate backbone 
analogous to those of KpAGO. Our inspection of the HsAGO2 struc- 
ture further revealed that it also has an extended nucleic-acid-binding 
channel, an N domain positioned to allow unobstructed guide-target 
pairing, and a plugged-in glutamate finger that completes a DEDH 
catalytic tetrad (Supplementary Fig. 14). These similarities indicate 
that the HsAGO2 structure (for which conserved residues were 
mutated to improve diffraction) has features of active RISC and that 
studies of KpAGO will continue to provide insights relevant to meta- 
zoan AGOs. 

In contrast to RNase H, which forms its active site during initial 
folding, AGO requires a conformational change to form its active site. 
What might explain this difference between these two related RNases? 
The constitutive active site of RNase H is well-suited to its role in 
nonspecifically cleaving RNA-DNA hybrids, whereas proper AGO 
function requires high specificity. Coupling siRNA duplex loading (in 
part through recognition by the PAZ domain) with active-site forma- 
tion imparts specificity to AGO, thereby preventing it from cleaving 
any base-paired RNA. After passenger-strand cleavage and removal, 
activity of the licensed AGO is restricted by its guide RNA. In this way, 
AGO activity is tightly controlled and spurious endonucleolytic 
cleavage is prevented. The previous view was that among proteins 
adopting the RNase H fold, RNase H enzymes were unique in having 
a catalytic tetrad, whereas the related endonucleases of this protein 
superfamily (including AGO) were missing the active-site residue 
corresponding to Glu 1013 (ref. 48). Our findings revising this view 
imply that some other proteins for which only a catalytic triad has 
hitherto been identified (for example, bacterial UvrC DNA repair 
protein) might also use the conditional insertion of a ‘missing’ 
catalytic residue to impart specificity. 


METHODS SUMMARY 


KpAGO was overexpressed in Escherichia coli as a His-SUMO-tagged fusion. 
Native and SeMet-substituted crystals were obtained by sitting-drop vapour dif- 
fusion at 20°C. The phase was determined by the single-wavelength anomalous 
dispersion method with selenium anomalous signals. Cleavage assays were 


ARTICLE 


performed with synthetic or transcribed RNA in 30mM Tris-HCl pH7.5, 
130 mM KCl, 10 mM NaCl, 1.1 mM MgCl, 0.1 mM EDTA, 1.3 mM dithiothreitol 
(DTT) and 5% glycerol, including 0.1% Triton X-100 where indicated. Yeast 
manipulations, in vivo assays and high-throughput sequencing were essentially 
as described previously’. Details of all procedures are listed in Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 
Protein purification. DNA encoding K. polysporus AGO1(Thr 207-Ile 1251) 
was cloned into a modified pRSFDuet vector (Novagen) containing an amino- 
terminal Ulp1-cleavable Hiss-SUMO tag. Protein was overexpressed in E. coli 
BL21(DE3) Rosetta2 (Novagen). Cell extract was prepared using a French press in 
buffer A (10 mM phosphate buffer pH 7.3, 1.5 M NaCl, 25 mM imidazole, 10 mM 
B-mercaptoethanol, 1mM _ phenylmethylsulphonyl fluoride) and cleared by 
centrifugation. The supernatant was loaded onto a nickel column (GE 
Healthcare) and then washed with buffer A. The target protein was eluted with 
a linear gradient of 0.025-1.5 M imidazole. After mixing with Ulp1 protease, the 
eluted sample was dialysed against buffer B (10mM phosphate buffer pH 7.3, 
500mM NaCl, 20mM imidazole, 10mM f-mercaptoethanol) overnight. The 
digested protein was loaded onto a nickel column to remove the cleaved Hisg- 
SUMO tag. The flow-through sample was dialysed against buffer C (5 mM phos- 
phate buffer pH 7.3, 10mM -mercaptoethanol) and then loaded onto an SP 
column (GE Healthcare). The protein was eluted with a linear gradient of 
0.0-2.0M NaCl, mixed with ammonium sulphate (2M final concentration) 
and then centrifuged. The supernatant was loaded onto a phenyl-Sepharose 
hydrophobic interaction column (GE Healthcare) in buffer D (10 mM phosphate 
buffer pH 7.3, 2M ammonium sulphate, 10 mM -mercaptoethanol), and the 
protein was eluted with a linear gradient of 2.0-0.0M ammonium sulphate. 
The eluted protein was dialysed against buffer E (300 mM sodium dihydrogen 
phosphate, 10 mM f-mercaptoethanol) and then loaded onto a MonoQ column 
(GE Healthcare) in buffer E. The protein was eluted with a linear gradient of 
0.0-2.0M NaCl. The eluted sample was concentrated by ultrafiltration and 
loaded onto a HiLoad 200 16/60 column (GE Healthcare) in buffer F (10 mM 
Tris-HCl pH 7.5, 200 mM NaCl, 5 mM DTT). Purified KpAGO was concentrated 
to approximately 40 mg ml using ultrafiltration and stored at —80 °C in protein 
storage buffer (10 mM Tris-HCl pH 7.5, 200 mM NaCl, 5mM DTT). 
Structure determination and refinement. Initial crystals of recombinant 
KpAGO diffracted poorly but could be improved by addition of 1,4-dioxane to 
the crystallization buffer. Native crystals of KpAGO were obtained at 20°C by 
sitting-drop vapour diffusion in 100 mM MIB buffer pH 5.0 (molar ratio, 2 Na- 
malonate:3 imidazole:3 boric acid), 3% 1,4-dioxane, 19% PEG3350, 12 mM 
MnCl, and 3% ethanol. SeMet-substituted crystals were grown at 20°C by sitting- 
drop vapour diffusion in 100mM MIB pH5.0, 3% 1,4-dioxane, 19% PEG3350, 
12 mM MnCh, 3% ethanol and 9 mM sarcosine. The native and SeMet-substituted 
crystals of KpAGO were soaked in collection buffer (1.2-fold concentrated reservoir 
solution) and cryoprotected with 20% glycerol. Both derivative data sets were 
collected at the Advanced Photon Source NE-CAT beamlines. Data were pro- 
cessed with HKL2000°. Data collection and refinement statistics are listed 
(Supplementary Table 1). A total of 33 selenium sites were found using peak data 
with HKL2MAP®" and were used for phase calculation at 4.2 A resolution with 
Phaser-EP*’. The initial phases were improved by solvent flattening, electron 
density histogram and non-crystallographic symmetry averaging with Parrot 
and DM”. The initial model was built manually with Coot™ and was improved 
by iterative cycles of refinement with Phenix”’. Molecular replacement was per- 
formed with MOLREP” using the SeMet structure as a search model. The final 
model was improved using the native data processed at 3.2 A. The Ramachandran 
plot analysis by PROCHECK® showed 82.0%, 17.3% and 0.8% of the protein 
residues in the most favourable, additionally allowed and generously allowed 
regions, respectively, with no residues in disallowed regions. The simulated- 
annealing omit map was calculated by CNS”. All figures of structures were 
generated with PYMOL™. 
RNAs. A list of RNA oligonucleotide sequences is provided (Supplementary 
Table 4). To generate cap-labelled target RNAs, RNA was transcribed in vitro 
with T7 RNA polymerase using DNA oligonucleotide templates. DNase-treated 
transcripts were purified on a denaturing gel and capped using the ScriptCap m’G 
Capping System (CellScript) according to the manufacturer’s directions, except 
that high-specific-activity RNA was prepared by omitting GTP and including 5 pl 
[a-**P]GTP (6,000 Cimmol '), and low-specific-activity RNA was prepared by 
using a 1,500:1 molar ratio of GTP:[«-*?P]GTP (6,000 Cimmol '). Cap-labelled 
RNA was gel-purified and quantified by scintillation counting, and 10X stocks 
were prepared in water supplemented with 1 1M DNA carrier oligonucleotide. 
5'-phosphorylated guide RNA and its 2’-deoxy-substituted variants were 
chemically synthesized (IDT) and gel purified. To prepare 5’ end-labelled RNAs, 
5'-OH RNAs were chemically synthesized (Dharmacon), deprotected, purified on a 
denaturing gel, phosphorylated with [y-*’?P]JATP (6,000 Cimmol”') using T4 
polynucleotide kinase (PNK, NEB) and gel purified again. To prepare 3’ end- 
labelled RNAs, 5’-phosphorylated RNAs lacking the terminal nucleotide (that is, 
22-nucleotide variants) were chemically synthesized (IDT), gel purified, extended 
using cordycepin 5'-[c-’*P]triphosphate (5,000Cimmol') and yeast poly(A) 
polymerase (USB) and gel purified again. 
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siRNA duplexes were prepared by annealing synthetic ssRNAs. Com- 

plementary RNAs designed to hybridize to generate 21-base-pair duplexes with 
two-nucleotide 3’ overhangs, were combined (using at least threefold excess 
unlabelled RNA) in dsRNA annealing buffer (30mM _ Tris-HCl pH7.5, 
100 mM NaCl, 1 mM EDTA) and slow-cooled from 90 °C to room temperature 
over >2h. Annealed RNAs were separated from ssRNAs on native 20% 
polyacrylamide gels, and duplexes were eluted from gel slices in 0.3M NaCl 
overnight at 4°C, ethanol-precipitated and stored in dsRNA storage buffer 
(10 mM Tris-HCl pH 7.5, 10mM NaCl, 0.1 mM EDTA). RNA was quantified 
by scintillation counting, and 10 stocks were prepared in dsRNA storage buffer 
supplemented with 1 4M DNA carrier oligonucleotide. 
AGO activity assays. For all biochemical assays, Kp AGO was diluted and stored at 
-20 °C in protein dilution buffer (5 mM Tris-HCl pH 7.5, 100 mM NaCl, 2.5 mM 
DTT, 50% glycerol). The concentration of KpAGO was determined by absorbance 
at 280 nm. For the slicing assay in Fig. 1c, 1.1 1M KpAGO was pre-incubated with 
110 nM guide RNA in 1.1X reaction buffer (1X reaction buffer: 30 mM Tris-HCl 
pH7.5, 130 mM KCl, 1.1 mM MgCh, 1 mM DTT, 0.1 mM EDTA) for Lh at 25 °C. 
To initiate the slicing reaction, 1 il cap-labelled target RNA (final concentration, 
200 nM) was added to 9 ul of the pre-incubated mixture. Reactions were incubated 
at 30 °C, and 3-1] aliquots were removed at the indicated time and quenched by 
addition to 12 ul formamide loading buffer (95% formamide, 18mM EDTA, 
0.025% sodium dodecyl sulphate, 0.025% xylene cyanol, 0.025% bromophenol 
blue). The slicing assay in Fig. 2h was conducted similarly except pre-incubation 
was performed with 110nM KpAGO and 50pM guide RNA, and target was 
subsequently added to a final concentration of 100pM. The slicing assay in 
Fig. 2f was guided by copurifying RNA and thus did not involve pre-incubation. 
These reactions contained 1X reaction buffer supplemented with 0.1% Triton 
X-100, 100nM KpAGO (or an equal volume of protein dilution buffer) and 
100 pM target RNA. Reactions were incubated at 30°C, and 5 ll aliquots were 
removed at the indicated time and quenched by addition to 15 ul formamide 
loading buffer. The passenger-strand cleavage reactions in Fig. 3e contained 1X 
reaction buffer, 100 jig ml” ' Ultrapure BSA (Ambion), 10 nM KpAGO and 50 pM 
substrate. All other passenger-strand cleavage reactions contained 1X reaction 
buffer supplemented with 0.1% Triton X-100, 100nM KpAGO (or an equal 
volume of protein dilution buffer) and 50 pM substrate. Reactions were incubated 
at 30 °C, and 5 ul aliquots were removed at the indicated time and quenched by 
addition to 10-15 ul formamide loading buffer. 

To monitor cleavage, RNAs were resolved on denaturing (7.5M urea) 
polyacrylamide gels (15% gel for target cleavage using synthetic guide RNA, 
20% for target cleavage using copurifying guide RNA or 22.5% for passenger- 
strand cleavage), and radiolabelled products were visualized by phosphorimaging 
(Fujifilm BAS-2500) and quantified using Multi Gauge (Fujifilm). For kinetic 
analyses, at each time point (f) the fraction product was measured as 
Fp = product/(product + substrate). Data in Fig. 3e were fit with a smoothed 
curve using the cubic spline method implemented in KaleidaGraph. 

Analysis of copurifying RNA. To avoid loss of especially small fragments, 
polynucleotides were extracted without subsequent precipitation. For analysis 
of pooled crystals, approximately 100 crystals were collected, stored in harvest 
buffer (100 mM MIB buffer pH 5.0, 3% 1,4-dioxane, 20% PEG 3350, 12 mM MnCl, 
3% ethanol, 6 mM sarcosine, 25% glycerol) and immediately frozen. After thawing, 
the mixture was diluted with an equal volume of water and extracted with an equal 
volume of phenol:chloroform:isoamyl alcohol (25:24:1, Sigma) followed by extrac- 
tion with chloroform. The aqueous phase was retained and diluted 1:20 in water for 
use in labelling reactions or used undiluted to prepare sequencing libraries. For 
analysis of soluble protein, polynucleotides were similarly extracted from 1.5 nmol 
KpAGO. For analysis of individual crystals, each single crystal was collected, stored in 
1 mM EDTA and immediately frozen. After thawing, the mixture was heated at 90 °C 
for 3 min, chilled on ice for 5 min and extracted with an equal volume of phenol:- 
chloroform:isoamyl alcohol (25:24:1, Sigma) followed by two extractions with chlo- 
roform. The aqueous phase was retained and used undiluted in labelling reactions. 

Prior to 5’ labelling, polynucleotides were dephosphorylated in a 20 il reaction 
containing 2 1l diluted polynucleotides (or water) and 1X PNK buffer (NEB) in 
the presence or absence of 2 units of thermosensitive alkaline phosphatase (TSAP, 
Promega) for 30 min at 37 °C. To inactivate TSAP, the reaction was quenched 
with 1 pl 220 mM EDTA and incubated at 74°C for 15 min. 5’ phosphorylation 
was performed in a 30, reaction containing 21 11 heat-inactivated TSAP 
reaction, 3 units T4 PNK (NEB), 0.04 ul [y-?P]ATP (8,000 Cimmol '), 0.8 ul 
10X PNK buffer and 1 pl 240mM MgCl, for 1h at 37°C. Reactions were 
quenched with an equal volume of 2X urea loading buffer, and products were 
resolved on a denaturing 22.5% polyacrylamide gel. For analysis of nuclease 
sensitivity, 15 pil aliquots were removed from PNK reactions and incubated with 
2 ul RNase I (Ambion) or RQ] RNase-free DNase (Promega) for 30 min at 37 °C 
before gel analysis. 
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To monitor preparation of sequencing libraries, trace amounts of synthetic 
3'-pCp[5’-**P]-labelled 7- and 23-nucleotide RNA internal standards were 
added to 21 undiluted polynucleotides isolated from soluble or crystalline 
KpAGO (or a water-only mock control). Dephosphorylation was performed in 
a 30 pl reaction containing 3 units TSAP (Promega) and 1 PNK buffer (NEB) 
supplemented with 21 manganese-chelating mix (10mM MgCl, 10mM 
EDTA) for 30 min at 37 °C. To inactivate TSAP, the reaction was quenched with 
1.5 1240 mM EDTA and incubated at 74 °C for 15 min. RNA was ligated to pre- 
adenylated adaptor DNA in a 50 ul reaction containing 32 pl heat-inactivated 
TSAP reaction, 100 pmol adaptor DNA*’, 45 units T4 RNA ligase 1 (NEB), 10% 
PEG8000 (NEB), 2 11 10 PNK buffer and 1 111390 mM MgCh for 2.5h at room 
temperature. After phenol extraction and precipitation, 28-50-nucleotide liga- 
tion products were gel-purified and 5’ phosphorylated in a 50 ll reaction contain- 
ing 20 units T4 PNK (NEB) and 1X PNK buffer supplemented with 1 pl 
[y-P]ATP (6,000 Cimmol ') for 30min at 37°C, followed by a chase with 
10 pl cold reaction mixture (1X PNK buffer, 28 units T4 PNK, 6mM ATP) 
and incubation for an additional 30 min at 37 °C. After desalting, phenol extrac- 
tion and precipitation, RNA was ligated to a 5'-adaptor RNA, gel-purified, con- 
verted to complementary DNA, amplified 10 cycles (soluble) or 12 cycles 
(crystalline) and sequenced using the Illumina SBS platform. The library pre- 
pared without input polynucleotides did not yield an observable PCR product, 
indicating minimal contamination from polynucleotides that might copurify 
with the enzymes used for library construction. 

Sequencing reads were filtered by requiring that they contain a perfect match 
to the first 12 nucleotides of the 3’ adaptor and that every nucleotide up to 
the beginning of the 3’ adaptor have a Phred+64 quality score of at least ‘p’. 
After removing the internal-standard reads and trimming away the adaptor 
sequences, reads representing the small RNAs were collapsed to a non-redundant 
set of 8-24-nucleotide sequences. To examine the origins of the copurifying 
RNAs, 15-24-nucleotide sequences were mapped sequentially to the KpAGO 
expression plasmid, the chloramphenicol-resistance gene found on pRARE2, 
and the BL21(DE3) genome”, allowing no mismatches and recovering all hits. 
(The 15-nucleotide lower bound for mapping was chosen because this was the 
minimum read length that achieved a <1% genome-mapping rate for random or 
shuffled small-RNA sequences). Because there were fewer fortuitous matches to 
the KpAGO expression plasmid, analysis of 12-nucleotide sequences was per- 
formed on reads that mapped to the plasmid. For mapping-independent analyses, 
sequences with <10 reads were not considered. 

For analysis of nucleotide composition, information content was calculated by 
determining the relative frequency of each nucleotide at position X compared to 
the relative frequency at all other positions combined. The selectivity for a given 
nucleotide n at position X was calculated using the following equation: 


Sn= [Ei- ave APEX XV G~Diftn~X)))] 


where f(i,X) is the frequency of nucleotide i at position X and f(i,~X) is the 
frequency of nucleotide i at all other positions. 
Information content scores were then calculated using the following equation: 


I, = Sy, X [logo(S,) + 2] 


For phasing analysis, the frequency of distances separating 5'-end pairs (i, j) 
mapping to opposite DNA strands was calculated using the following equation: 


Frequencyp = ©, min(Reads;, Reads;)p 


where D = (distance between small-RNA 5’ ends) + 1 
Yeast manipulations. S. castellii and S. cerevisiae were grown at 25 °C and 30 °C, 
respectively, on standard S. cerevisiae plate and liquid media (for example, YPD 
and SC). Transformations of S. castellii were performed as described previously”. 
Transformations of S. cerevisiae were performed as described®'. For FACS ana- 
lyses, strains were inoculated in SC, in either non-inducing (2% glucose) or 
inducing (1% galactose and 1% raffinose) conditions, and grown overnight. 
Fresh cultures were then seeded from the overnight cultures, and cells were grown 
to log phase. Cells were analysed using FACSCalibur (BD Biosciences); data were 
processed with CellQuest Pro (BD Biosciences) and FlowJo (Tree Star). 
Plasmids and strains used and generated in this study are listed (Supplemen- 
tary Tables 5 and 6). Vectors pRS404CYC1-KpAGO1 and pRS405TEF-KpDCRI1 
were constructed by insertion of the coding sequencing of the respective K. 


polysporus genes between the CYC1 or TEF promoter and CYC1 terminator 
(cloned from p416CYC or p416TEF®”) of the appropriate vector® using Spel 
and Xhol sites (KpAGO1) or BamHI and Xhol sites (KpDCR1). Vector 
pRS404CYC1-KpAGO1 (207-1251) was constructed similarly, with the insertion 
of an ‘ATG’ codon upstream of amino acid 207. Vector pRS404CYC1-FLAG3- 
KpAGOI1 was generated by PCR-based insertion of the sequencing encoding the 
Flag; epitope downstream of the ‘ATG’ codon of pRS404CYC1-KpAGO1. Point 
mutations were introduced by PCR-based mutagenesis to generate vectors encod- 
ing mutant Flag-tagged Agol. pRS402GPD-GFP(S65T) was constructed by inser- 
tion of the coding sequence of GFP(S65T) (amplified from pFA6a-GFP(S65T)- 
kanMX6™) between the GPD promoter and CYCI1 terminator (cloned from 
p416GPD™) of pRS402® using Spel and Xhol sites. To reconstitute RNAi in S. 
cerevisiae, GFP(S65T), KpAGO1 and KpDCRI expression vectors were inte- 
grated into W303-1B variants already containing other components of the 
GFP-silencing system”, using standard protocols*'. To generate S. castellii strains 
DPB267 and DPB268 for degradome sequencing, XRNI was deleted in DPB005 
and DPB007, respectively, using the kanMX6 cassette™. 

Degradome sequencing and analysis. Total RNA was isolated from mid-log 
phase (Dgoo ~ 0.6) cultures of strains DPB267 and DPB268 using the hot-phenol 
method. Degradome libraries were constructed from 5 ig poly(A)* RNA essen- 
tially as described** and sequenced on the Illumina SBS platform. After removing 
adaptor sequences and generating each reverse complement, reads representing 
degradome-cleavage tags were collapsed to a non-redundant set. To analyse tags 
deriving from Y'-element loci, 20-21-nucleotide sequences were mapped to a 
consensus S. castellii Y' element as described previously”’, and 49% of reads in the 
AGO1 library were randomly sampled (to normalize for higher sequencing yield, 
Supplementary Fig. 2b) and used for subsequent analyses. Mapping data were 
then used to generate a single-nucleotide-resolution plot of the consensus Y' 
element. For phasing analysis, the frequency of distances separating opposite- 
strand pairs of 5’ ends of 20-21-nucleotide degradome tags (i), and 5’ ends of 
22-23-nucleotide small RNAs (j) was calculated using the following equation: 


Frequencyp = &,; Reads; X Reads,/Norm, 


where D = position of 5’ end of degradome tag with respect to 5’ end of small 
RNA, and Norm; = number of reads for all small RNAs in which the 5’ end of 
degradome tag i falls. Fractional frequencies were calculated for each D by divid- 
ing Frequencyp by the total number of reads corresponding to degradome-tag 5’ 
ends that map opposite 22-23-nucleotide small RNAs. 
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The abundance of heavy elements (metallicity) in the photospheres 
of stars similar to the Sun provides a ‘fossil’ record of the chemical 
composition of the initial protoplanetary disk. Metal-rich stars are 
much more likely to harbour gas giant planets'*, supporting the 
model that planets form by accumulation of dust and ice particles’. 
Recent ground-based surveys suggest that this correlation is 
weakened for Neptunian-sized planets**°. However, how the rela- 
tionship between size and metallicity extends into the regime of 
terrestrial-sized exoplanets is unknown. Here we report spectro- 
scopic metallicities of the host stars of 226 small exoplanet candi- 
dates discovered by NASA’s Kepler mission", including objects 
that are comparable in size to the terrestrial planets in the Solar 
System. We find that planets with radii less than four Earth radii 
form around host stars with a wide range of metallicities (but on 
average a metallicity close to that of the Sun), whereas large planets 
preferentially form around stars with higher metallicities. This 
observation suggests that terrestrial planets may be widespread 
in the disk of the Galaxy, with no special requirement of enhanced 
metallicity for their formation. 

In February 2011, the Kepler mission'® announced its discovery of 
1,235 planet candidates, of which more than half have radii smaller 
than that of Neptune’’: Rp < 4Rq, where Rg is the Earth radius. We 
used reconnaissance spectra obtained by the Kepler Follow-up 
Observing Program (FOP) to derive metallicities for several hundred 
of the brighter planet candidates, and used the results to explore the 
relationship between planet size and host-star metallicity. Metallicity, 
denoted [m/H], is defined as the proportion of a star’s outer layers 
made up of chemical elements other than hydrogen and helium and 
expressed on a logarithmic scale where zero is the Sun’s metallicity. 
Thousands of spectra have been gathered by the Kepler FOP, but the 
majority of the spectra have signal-to-noise ratios too low to extract 
precise stellar parameters using traditional methods. To take full 
advantage of this large observational effort, we have developed a tool 
(stellar parameter classification (SPC); see Supplementary Information) 
that uses a library of synthetic spectra to determine stellar parameters 
from spectra with modest signal-to-noise ratios (signal-to-noise per 
pixel >15). Using this approach, we derived metallicities in a consistent 
and homogeneous manner for the entire sample of Kepler FOP spectra, 
thus avoiding the systematic differences that can occur when compar- 
ing metallicities derived by different techniques. Only the most robust 
classifications are presented here (Supplementary Information), yield- 
ing precise stellar parameters for 152 stars harbouring 226 planet 


candidates mostly in orbits within 0.5 Au of the host star. We used 
the stellar parameters from SPC and the Yonsei-Yale stellar evolu- 
tionary models” to estimate the radii of the host stars, which we couple 
with the photometric data from the Kepler mission” to infer the planet 
radii (Supplementary Information). 

Previous studies**° have suggested that the observed correlation 
between metallicity and the likelihood that solar-type stars host gas 
giants is weaker for Neptunian-sized planets. However, it is unclear 
whether this correlation extends into the regime of terrestrial-sized 
planets, which is important for a better understanding of planet- 
formation processes. The number of host stars with planets smaller 
than Neptune in our sample (175 planets) is significantly larger than in 
earlier studies and includes much smaller planets (as small as Earth). 
This allows us to compare a statistically significant sample of homo- 
genously derived spectroscopic metallicities of solar-type stars hosting 
small and large planets. By contrast, a recent study used metallicity 
indicators based on photometry". In Fig. 1, we show that the average 
metallicity of stars hosting planets with radii smaller than that of 
Neptune (Rp < 4.0Rq) is lower ([m/H] = —0.01 + 0.02) than that of 
the stars harbouring gas giant planets ([m/H] = +0.15 + 0.03). We 
find that smaller planets are observed at a wide range of host-star 
metallicities (—0.6 < [m/H] < +0.5), whereas larger planets are 
detected preferentially around stars with higher metallicity (Figs 2 
and 3). To investigate the statistical significance of the difference in 
metallicity, we perform the two-sample Kolmogorov—Smirnov test of 
the two subsamples of host stars and find the probability that the two 
distributions are not drawn randomly from the same parent popu- 
lation is 99.96% (over 3.5). An F-test shows that fitting the data in 
Fig. 3 with a metallicity that increases linearly, as opposed to being 
constant, as a function of radius yields a better fit with a confidence 
level of 99.99995% (~5q). 

Figures 2 and 3 reveal that the population of small planets has a wide 
range of host-star metallicities, but on average the metallicity of the stars 
hosting the smaller planets is lower than that of the larger planets. The 
Kepler-11 system'* demonstrates that small planets can possess a wide 
range of mean densities, much like their Jupiter-sized counterparts, and 
the low mean density of exoplanets Kepler-11d, e and f implies that 
these planets formed before the gas in the system dissipated completely. 
The metallicity of the protoplanetary disk may have a key role in 
how quickly planetary cores can form and, thus, in whether they are 
able to accrete a gaseous envelope before the gas in the system dis- 
sipates. However, additional data, including dynamical masses, are 
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Figure 1 | Average host-star metallicities. Stellar metallicity is defined as 
[m/H] = logio(Nim/Nu)star — logio(Nm/Nu)sun Where Ny, and Ny are 
respectively the number densities of metal atoms (all elements more massive 
than helium) and hydrogen atoms. Red points represent the average metallicity 
of the host stars with planets of different radii grouped in 1.33Rq and 4R@ bins. 
The bin size is indicated by the length of the horizontal line and the uncertainty 
in the average metallicity is given by the standard error. The shaded grey 
histogram shows the number of planets in each bin, and illustrates the large 
number of small planets in the Kepler sample. The average metallicity of host 
stars with smaller planets (Rp < 4Rq) is lower ([m/H] = —0.01 + 0.02) than 
that of host stars with larger planets ([m/H] = +0.15 + 0.03). Some of the 
planetary candidates in the Kepler sample are expected to be false positives that 
do not turn out to be transiting planets, such as occurs when the reduced signal 
from a background eclipsing binary is by chance contained within the 
photometric aperture of the foreground target star. The false-positive rate of the 
candidates that pass the standard vetting procedures applied by the Kepler team 
has been estimated to be less than 10% (ref. 26). Therefore, such a low false- 
positive rate is not expected to impact our results and interpretation. We have 
thus ignored possible contamination by false positives. We do not derive 
absolute probabilities or occurrence rates of planets and therefore do not attempt 
to eliminate the many strong bias and selection effects that, for example, 
completeness studies (for example ref. 27) must take into account. We have 
explored the possibility that correlations between planet size and parameters 
such as orbital semi-major axis are the source of the apparent dependence on 
metallicity, but find no evidence for such an effect (Supplementary Information). 
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Figure 2 | Comparison of host-star metallicities for small and large planets. 
The histograms compare the metallicities of two samples of stars hosting 
planets by dividing the sample at Rp = 4R@. The host stars of the gas giant 
planets (Rp = 4Rq; red histogram) are clearly more metal rich than those of the 
smaller planets (Rp < 4R@; blue histogram), which have a much wider range of 
metallicities. The hatched area represents the area where the histograms 
overlap. A Kolmogorov-Smirnov test shows that the probability that the two 
distributions are not drawn randomly from the same parent population is 
greater than 99.96%; that is, the two distributions differ by more than 3.50. The 
average metallicity of the stars with small planets ([m/H] = —0.01 + 0.02; 
blue histogram) differs by almost 5c from that of the larger planets 

([m/H] = +0.15 + 0.03; red histogram). 
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Figure 3 | Individual host-star metallicity as a function of planet radius. 
The black dots represent single-planet systems, whereas the green dots 
represent the largest planet and the red dots represent all the smaller planets in 
multiple-planet systems. The confirmed, published Kepler planets in our 
samples are plotted as squares with the same colour code as the dots. Planet 
candidates in multiple systems are each added to the sample with the same 
host-star metallicity. In Supplementary Information, we consider systems of 
planets as opposed to individual planets by neglecting all but the largest planet 
in each system. The vertical dotted line indicates the division of the sample at 
Rp = 4.0Rq. The data show that Kepler detects small planets around stars with 
a wide range of metallicities (—0.6 < [m/H] < 0.5), and that larger planets are 
found preferentially around stars with solar metallicity or higher. The average 
uncertainty in the individual measurements in metallicity is 0.08 dex and that in 
planetary radius is 12%. 


needed better to understand the seemingly diverse regime of small 
planets. 

Our data show that the well-established correlation between 
metallicity and occurrence of giant planets’’ does not extend into 
the smaller planet regime below Rp<4Rq@, where the host stars 
instead have a wide range of metallicities. This observation implies 
that, by contrast with smaller planets, gas giants require exceptional 
conditions to trigger their formation. Our findings agree well with the 
core accretion theory for planet formation, whereby high-metallicity 
environments allow planetary cores to grow rapidly to reach approxi- 
mately ten times the mass of the Earth, continue to accrete a gaseous 
envelope and evolve to gas giants of several hundred Earth masses’. 
Gas disks around young stars are observed to dissipate within a few 
million years'*, requiring the cores of their planets to reach ten Earth 
masses within that time if they are to become gas giants. Planets 
forming in low-metallicity environments, however, may not reach 
large enough core masses before the dissipation of the gas disk, which 
could explain why we find very few gas giants around low-metallicity 
stars. Planetary accretion cannot compete with gas dissipation around 
low-metallicity stars because the number density of planetesimals is 
low'*** and gas disks dissipate sooner around low-metallicity stars’?”°. 

The semi-major axes of the orbits of the majority of the Kepler 
planets analysed in this work are less than 0.5 AU, so the detected gas 
giants in our sample were probably brought into orbits within 1 au by 
migration’’. A decreased efficiency of migration in low-metallicity 
disks could partly explain the observed deficiency of gas giants around 
the low-metallicity stars. The formation of gas giants late in the lifetime 
of the protoplanetary gas disk would reduce their subsequent migra- 
tion because the gas disk is diluted at that stage. This could partly 
explain why we observe so few gas giants in close orbits. However, late 
planet formation will in itself suppress formation of gas giants because 
some cores are formed after the disappearance of the gas disk. Hence, 
migration cannot be the only reason for the small number of gas giants 
that we observe around low-metallicity stars. 

During the initial stages of planet formation, dust grains collide to 
form planetesimals, which represent the kilometre-sized building 
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blocks of planets. In some models, planetesimal formation is only 
possible in disks with metallicities greater than the solar value’””*. 
However, our results show that small planets are present around stars 
with a wide range of metallicities. The formation of planetesimals in 
low-metallicity environments can occur if the metallicity is enhanced 
by preferential evaporation of gas in the disk, for example by photo- 
evaporation processes associated with the central star or, alternatively, 
by external sources such as nearby massive stars. In both cases, the 
removal of gas by photo-evaporation is expected to occur early in the 
lifetime of the disk, possibly within one million years’*”*. Such short 
timescales are consistent with radiometric age dating of meteorites 
suggesting that, in the Solar System, planetesimal accretion may have 
begun as early as a few hundred thousand years following formation of 
the Sun”. 

Finally, we note that some studies have proposed that a metallicity of 
at least half that of the Sun is required for the formation of terrestrial 
planets”. However, our analysis based on the Kepler planet candidates 
indicates that terrestrial planets can form at a wide range of metallicities, 
including metallicities almost four times lower than that of the Sun 
({m/H] ~ —0.6). In addition, we find that the frequency of occurrence 
of small planets (Rp<4.0Rq) relative to that of large planets 
(Rp > 4.0R@) is ~2.7:1 for stars of metallicity greater than that of 
the Sun but increases to ~5.9:1 for stars of metallicity less than that 
of the Sun. Therefore, the formation of small, terrestrial planets does 
not require a metal-rich environment, suggesting that their existence 
might be widespread in the disk of the Galaxy. 
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Constraints on the volatile distribution within 
Shackleton crater at the lunar south pole 
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Shackleton crater is nearly coincident with the Moon’s south pole. 
Its interior receives almost no direct sunlight and is a perennial 
cold trap’”, making Shackleton a promising candidate location in 
which to seek sequestered volatiles*. However, previous orbital and 
Earth-based radar mapping** and orbital optical imaging’ have 
yielded conflicting interpretations about the existence of volatiles. 
Here we present observations from the Lunar Orbiter Laser 
Altimeter on board the Lunar Reconnaissance Orbiter, revealing 
Shackleton to be an ancient, unusually well-preserved simple crater 
whose interior walls are fresher than its floor and rim. Shackleton 
floor deposits are nearly the same age as the rim, suggesting that 
little floor deposition has occurred since the crater formed more 
than three billion years ago. At a wavelength of 1,064 nanometres, 
the floor of Shackleton is brighter than the surrounding terrain 
and the interiors of nearby craters, but not as bright as the interior 
walls. The combined observations are explicable primarily by 
downslope movement of regolith on the walls exposing fresher 
underlying material. The relatively brighter crater floor is most 
simply explained by decreased space weathering due to shadowing, 
but a one-micrometre-thick layer containing about 20 per cent 
surficial ice is an alternative possibility. 

Detailed study of the topography of Shackleton (Fig. 1a) offers the 
opportunity to improve understanding of processes that operate in 
permanently shadowed regions (Fig. 1b). Crater geometry, age and 
preservation state are relevant for understanding the accumulation 
and preservation of volatiles as well as the processes that modify the 
lunar surface over geologic timescales. 

Our analysis uses observations from the Lunar Orbiter Laser 
Altimeter (LOLA)"®, an instrument on NASA’s Lunar Reconnaissance 
Orbiter (LRO) mission. LOLA is a five-beam laser altimeter that operates 
at a wavelength of 1,064.4 nm with a 28-Hz pulse repetition rate. From 
LRO’s mapping orbit at ~50 km altitude, the instrument illuminates 
5-m-diameter spots on the lunar surface, returning up to 140 measure- 
ments of elevation per second; the five profiles enable characterization 
of bi-directional slopes over various baselines, and roughness from 
averaging of pulse elevations. In addition, from the spreading of 
backscattered laser pulses, LOLA obtains the root-mean-square 
(RMS) roughness of the surface within laser footprints. Finally, 
from the ratio of received to transmitted laser energy, LOLA measures 
the reflectance of the lunar surface at zero phase angle at the laser 
wavelength within laser spots. 

As of 1 December 2011, the LOLA instrument has accumulated 
more than 5.1 billion elevation measurements’’. Because Shackleton 
lies nearly at a pole, where the LOLA coverage is densest, it is possible to 
construct a digital elevation model of unprecedented spatial resolution 
and radial accuracy. More than 5,000 LOLA tracks, referenced to the 
Moon’s centre of mass via precision orbits determined from radio 
tracking’* aided by Earth-based laser tracking’*’*, were converted 


to topography. Track segments within the area of interest were geo- 
metrically corrected at orbit crossover points’. 

Figure 1a shows the topography of Shackleton crater sampled at 
10-m spatial resolution; individual measurements have an accuracy of 
~1m with respect to the Moon’s centre of mass. The 40 km X 40 km 
topographic model of Shackleton is derived from 5.358 million eleva- 
tion measurements with an average of 0.34 altimeter measurements in 
each 10-m square area; the resolution is comparable to or better than 
other studies of Shackleton’s interior by images”, Earth-based radar® 
and orbital synthetic aperture radar’®. The topography reveals the 
near-axisymmetric bowl-shaped nature of the crater, in which the 
crater rim and interior walls are well preserved. The depth/diameter 
ratio of the crater is 0.195 + 0.025 (Table 1), which is consistent with 
other fresh simple craters’’. 

Figure 1c shows bi-directional surface slopes over 10-m baselines 
that quantify the uniform, steep inner walls; slopes approach the angle 
of repose. Slopes are greatest in the mid-levels of walls, which is in 
contrast to many crater walls on Mars, where near-vertically oriented 
cliffs of outwardly dipping coherent cap rock are exposed in the upper 
walls’®. 

Surface roughness at a scale of 20-50 m is shown in Fig. 1d. These 
data indicate that crater walls are smoother, within bounds of 
measurement uncertainty, at this spatial scale than the floor or rim, 
especially portions of lower walls aligned with mounds on the crater 
floor. Table 1 lists estimated roughness values of various crater com- 
ponents. Similarly, the average RMS roughness derived from spread- 
ing of individual laser footprints (Supplementary Information) is 
lower on the crater walls. Pulses returned from the steep walls of 
Shackleton are spread in time by >10 ns, but after correcting for the 
effect of local slope on a longer baseline, the pulse spreading due 
to surface roughness is somewhat less than on the crater floor or 
surrounding terrain. The floor can be divided into two regions, a flat 
portion and an elevated terrain. The roughness of the mound unit 
increases at the largest scales due to its hummocky character, but it 
is smoother than the flat region at smaller scales, due to its paucity of 
craters. 

Figure 2 shows a more detailed view of the topography of 
Shackleton’s floor, which highlights the irregularly distributed deposits 
and numerous small craters (see also Fig. le). The largest mound of 
material has a relief of ~210 m (Table 1) and the highest-local slope of 
any of the floor deposits is ~25°, which is below the angle of repose. 
Two areas of the floor show fan-shaped structures consisting of material 
that has been transported downslope from the interior walls in a 
manner commonly observed in craters of this size range’. The limited 
fan material around the margins of the crater floor, combined with the 
asymmetric distribution and slope properties of deposits, suggest 
that the predominant contribution to the fill is ejecta fallback with a 
secondary contribution from slumped wall deposits. 
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Figure 1 | Detailed characterization of Shackleton crater. a, Topography in 
km; b, percentage of time illuminated; c, 10-m baseline slopes in degrees; 

d, surface roughness shown as RMS residual in m; e, locations of crater counts 
used to determine relative ages; and f, zero-phase, 1,064-nm reflectance shown as 
I/F. Topography, slopes and roughness are based on a 10-m spatial resolution grid 
of all available LOLA profiles. In a-d and f, x and y axes indicate spatial scale, 
where (0, 0) is the lunar south pole and colour scales show magnitude of plotted 
quantity. White regions in b correspond to zero illumination. Panel e shows 
locations of craters counted to estimate relative age, plotted over 10-m slopes 
(colour coded as in inset). Crater regions in e correspond to: A, flat region of crater 


Table 1 | Parameters describing Shackleton crater 


Parameter Value 
Areocentric latitude of centre of rim (degrees) —89.655 
Areocentric longitude of centre of rim (degrees) 129.174 
Lunar radius at floor centre (km) 1,734.63 
Mean crater diameter at rim (km) 21 

Mean depth, rim to floor (km) 4.1+0.05 
Mean rim height above datum (km) 1.3 

Range of floor topography (km) ~0.210 
Area of crater at rim (km?) ~346 

Area of crater floor (km?) ~38 
Estimated fill depth (km) ~0.75 
Crater volume (km?) 640 +10 
Fill volume, including mounds (km?) 1221 
Maximum wall slope (degrees) 35 

Average wall slope (degrees) 30.5 

RMS roughness* of crater exterior (m) ~1 

RMS roughness* of crater walls (m) 21 

RMS roughness* of crater floor (m) ~1 

RMS roughness* of crater rim (m) ~l1 

I/F of crater exterior 0.32 + 0.04 
I/F of interior walls 0.46 + 0.03 
I/F of interior floor 0.43 + 0.02 
Ratio of average depth/average rim diameter, d/D 0.195 + 0.025 


See Fig. | legend for definition of I/F. 
* Within 5-m spots. 
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floor; A/B, entire crater floor; C, crater wall; D, crater rim crest; E/F, inner rim 
annulus (~5.5 km); E, inner rim annulus excluding steep region (F); F, steep rim 
region within annulus; G, crater wall section; I, Shackleton crater deposits north of 
rim in flat areas; and X, secondary crater chains and clusters (removed from 
analysis). In f, reflectance is expressed as a radiance factor (I/F), which is defined as 
the ratio of the measured radiance I to the radiance F ofan ideal diffusive surface in 
vacuum with 100% reflectance under the same illumination. Each dot represents a 
0.4 X 0.4km pixel median average of LOLA’s spot 3 reflectance. Contours show 
topography at 0.2 km intervals. The grey annulus shows the 17-km diameter of 
the steepest portion of the walls and the 7-km diameter of the floor. 


Shackleton was previously assigned an Eratosthenian age’? (middle 
lunar history; in the approximate interval 1-3.2 Gyr before present) on 
the basis of its relatively fresh morphology, its lack of rays, and counts 
of superposed craters’ using AMIE image data (50m per pixel) and 
Arecibo radar data (20m per pixel). Craters were counted within a 
crater diameter (~20 km) of the rim crest, avoiding obvious secondary 
craters. This analysis was subsequently revisited and resulted in an 
older, Imbrian age’ (in the approximate interval 3.2-3.8 Gyr before 
present). 

Here we use a LOLA shaded relief map to advance previous work by 
individually dating different parts of the crater (see Fig. le and 
Supplementary Information) to investigate the processes that have 
operated since crater formation. LOLA observations permit dating 
of shadowed regions in the crater interior and allow spatially unbiased 
measurements of crater density due to uniformity in illumination 
conditions. In addition, illumination can be varied over the topographic 
model to enhance crater detection. On the basis of comparison of the 
several different areas of the rim of the crater, it is clear that the variable 
slopes of the rough crater rim have an influence on crater retention. For 
example, two areas of very flat terrain on the Shackleton flank within 
one crater diameter of the rim crest (I; Fig. le) yield modelled crater ages 
of ~3.69 Gyr, whereas areas closer to the rim crest (within ~5.5 km) 
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Figure 2 | High-resolution elevation map in stereographic projection of the floor of Shackleton. Elevations are contoured at 5-m intervals with colours 


indicating elevation with respect to 1,737.4 km. The axes indicate spatial scales. 


yield ages of ~1.21 Gyr (F, steeper slopes; Fig. le) and ~2.91 Gyr 
(E, fewer steep slopes; Fig. le). The flat areas of the Shackleton crater 
deposit (I; Fig. le) indicate an age of ~3.69 Gyr (Supplementary Table 1), 
older than the originally estimated age of 1.3-3.3 Gyr (ref. 19) but close 
to the Upper Imbrian age of ~3.6Gyr estimated subsequently’. 
Determination of ages of these individual regions permits quantitative 
investigation of how the crater has been modified. 

A critical question is the age of the permanently shadowed portions 
of the interior walls and floor of Shackleton. Analysis indicates that the 
lunar spin axis has been at its approximate current orientation for 
~2 Gyr (ref. 20). If the crater had been accumulating volatiles in the 
permanently shadowed areas over a period of this order, it is reasonable 
to hypothesize that its interior was resurfaced, covering and burying 
craters and thus producing a younger relative age. Examination of the 
steep crater wall (C; Fig. le) yields a much younger crater retention age 
of ~1.44 Gyr, which could be consistent with either volatile mantling or 
downslope mass wasting (Fig. 1c)’’. Examination of the permanently 
shadowed parts of the crater floor (A; Fig. le), the area where volatiles 
plausibly accumulated, reveals a crater retention age of ~3.60 Gyr, 
essentially identical to the flat areas of the crater rim. Including the 
rougher parts of the crater floor (B; Fig. le) produces an age of 
~3.29 Gyr, which almost certainly reflects the influence of downslope 
transport, as observed on the inner part of the Shackleton rim, on the 
retention of craters. The similar age of the Shackleton crater exterior 
and floor (~3.69 and 3.60 Gyr) is evidence that if volatiles accumulated 
in this cold trap for ~10” years, they were not in sufficient quantity to 
alter ina statistically significant sense the size-frequency distribution of 
superposed craters in the size range counted. 

Knowledge of the crater size-frequency distribution (Supplemen- 
tary Fig. 1) permits an assessment of the minimum amount of depos- 
ition that could occur without disrupting this distribution. The craters 
counted on Shackleton’s floor ranged from ~250 m up to ~500 m in 
diameter, representing a range of fresh crater depths of ~50-100m 
(ref. 22). Accumulation of volatiles to thicknesses in excess of 20-50 m 
would significantly alter the crater size-frequency distribution and 
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should be observed in terms of a deficit in the number of small craters. 
A rollover in the curve at small crater diameters is observed, but is very 
similar to that seen outside the crater (I; Fig le and Supplementary 
Fig. 1), and thus is more likely to be due to the typical destruction 
effects of superposed craters and other diffusive processes (such as 
micrometeorite bombardment or seismic shaking associated with 
moonquakes triggered by stresses associated with impacts or tides) 
at these diameters. On the other hand, if volatiles were cold-trapped 
by vapour diffusion into the regolith, as opposed to deposition in 
surface layers, then muting of superposed craters may not have been 
as significant. 

Figure 1f shows profiles of 1,064-nm reflectance of Shackleton crater 
and its surroundings. A previous study” obtained images of the floor of 
Shackleton from the Kaguya Terrain Camera at the time of maximum 
scattering illumination at the lunar south pole and observed no evid- 
ence for brightening; results were interpreted to indicate an absence of 
pure ice deposits on the crater floor. In the current study, LOLA pro- 
files assembled from numerous orbital passes at the most favourable 
conditions for obtaining reliable measurements of reflectance show 
that the crater walls are anomalously bright relative to the surrounding 
terrain. Similarly to other impact craters in this size range, this bright- 
ness could be due to downslope movement of material caused by 
micrometeorite and small projectile bombardment on steep slopes, 
or by seismic shaking. The cascading of regolith material downslope 
exposes optically less mature surfaces than those developed and 
retained on lower slopes. 

Ata wavelength of 1,064 nm, the floor of Shackleton crater is darker 
than its interior walls, but both floors and walls are considerably 
brighter than the surrounding terrain, including the interiors of nearby 
craters that are both shadowed and sunlit. The relative brightness of the 
floor relative to surroundings requires explanation. Micrometeorite 
bombardment and impingement of the solar wind produce ‘space 
weathering’ of exposed geologic materials that reddens and darkens 
their surfaces”’. The former (bombardment) would be less significant 
in the shadowed interior of Shackleton than in the surrounding region 
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because the interior of the crater has not been exposed to the Sun for 
more than 2 Gyr (ref. 20), whereas the latter (solar wind impingement) 
might be enhanced in permanently shadowed craters”. Thus the floor 
brightness enhancement could be explained by a dearth of space 
weathering by micrometeorite bombardment. 

Volatile deposition is an alternative possibility. Under the conser- 
vative assumption that water ice has a 1,064-nm reflectance twice that 
of the lunar regolith” and that both are observed at zero phase, the 
measured reflectance of the floor can be explained by a micrometre- 
thick surface layer (the depth over which the laser backscatter 
measurement is sensitive) of 22% ice mixed with rock*®. Greater ice 
contents distributed throughout a thicker layer are possible but cannot 
be constrained from LOLA’s reflectivity measurement. For com- 
parison, far-ultraviolet reflectance of permanently shadowed regions 
from the LRO Lyman Alpha Mapping Project (LAMP) is consistent 
with ~1-2% surface water frost’’. 

Results from the LRO Mini-RF orbital radar for the interior of 
Shackleton"* provide additional insight. Regions with thick ice deposits 
are expected to have circular polarization ratios (CPR) >1, but such 
high ratios can also be explained by surface roughness. High-resolution 
images from ground-based radar® show that some areas with high CPR 
lie inside the rim of Shackleton. Mini-RF data'® reveal that CPR values 
decrease with depth within the crater, and CPR values on the floor of 
Shackleton crater are predominantly <1. The pixels with CPR values 
in excess of unity are distributed heterogeneously throughout the 
crater walls, correlate generally with regions of high roughness 
observed by LOLA (Fig. 1d) and include some sunlit areas (Fig. 1b). 
Although some contribution to high CPR values from volatiles is 
possible, and would imply a process in which volatile deposition 
operates very rapidly in comparison to the rate of removal, the com- 
bined data suggest that the higher floor reflectance is due primarily to 
the dearth of space weathering in this shadowed environment. 

In considering why Shackleton’s interior walls have a higher reflec- 
tance than its floor, it is instructive to note that wall brightening is not 
restricted to areas that are continuously shadowed but extends to the 
upper illuminated portions. Consequently, a higher concentration of 
surface volatiles on the walls than present on the floor is an unlikely 
explanation. More likely is downslope movement of regolith material 
on the steep crater walls that has exposed brighter underlying material; 
downslope movement is consistent with the observed slopes near the 
angle of repose and the roughness and morphology of Shackleton’s 
interior walls (Fig. 1c, d), as well as with Earth-based radar backscatter® 
and Mini-RF'® measurements. 


METHODS SUMMARY 


LOLA, an instrument on board the LRO spacecraft, outputs five beams per laser 
pulse that are backscattered from the lunar surface and detected in the instru- 
ment’s receiver. The relevant measurement is the time of flight of each individual 
laser pulse, which can be converted to a range of the spacecraft to the lunar surface 
given knowledge of the position of the spacecraft with respect to the Moon’s centre 
of mass. Radial range errors were minimized by geometric adjustment of altimetric 
tracks within the study area. In practice, profiles of one-way range from the LOLA 
instrument to the lunar surface along the spacecraft ground track were converted 
to lunar radius at each bounce point using the reconstructed orbit of LRO. 
Topography was determined by subtracting a sphere of 1,737.4km from each 
radius measurement. 

Slopes were calculated from a two-laser-spot fit at 20-50-m length scales, and 
RMS roughness represents a standard deviation about a plane fitted by least- 
squares to two laser shots along-track, from which at least four valid spots are 
returned out ofa possible total of ten. The plane has dimensions of 90-100 m in the 
longest axis and 10-40 m in the shortest axis, depending on the positions of the 
spots returned. 

The ratio of returned to transmitted pulse energy is a measure of surface reflec- 
tance at the laser wavelength. The transmitted and returned pulse energies were 
measured by integrating the area under the pulses. Observations of 1,064-nm 
reflectance were derived from LOLA tracks crossing Shackleton for days 130- 
149 in 2010. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


The Lunar Orbiter Laser Altimeter (LOLA), an instrument aboard the Lunar 
Reconnaissance Orbiter (LRO) spacecraft, outputs five beams per laser pulse that 
are backscattered from the lunar surface and detected in the instrument’s receiver. 
The relevant measurement is the time of flight of each individual laser pulse, which 
can be converted to a range of the spacecraft to the lunar surface given knowledge 
of the position of the spacecraft with respect to the Moon’s centre of mass. 
The timing of LOLA instrument events was derived from the LRO ultrastable 
oscillator, which is monitored by ground tracking stations. Time systems on board 
LRO used Coordinated Universal Time (UTC) to correlate spacecraft Mission 
Elapsed Time (MET) to ground time. The analysis of LOLA data used 
Barycentric Dynamical Time as its primary time system. Spacecraft states relative 
to the Solar System Barycentre (SSB) at the laser transmit and detector receive 
times were projected along the instrument boresight and return path vectors to 
match the observed time of flight, correcting for the aberration of light and 
general-relativistic time delays. SSB states were determined in the Earth Mean 
Equator of 2000 (J2000) inertial reference frame using lunar spacecraft trajectories 
and the DE421 planetary ephemeris”* 

During its lunar mapping mission, the LRO spacecraft is tracked using S-band 
Doppler and range data by the Universal Space Network, Deep Space Network and 
White Sands Missile Range. The precise reconstruction of LRO orbits used 
Doppler tracking observations from these stations as well as laser ranging to 
LRO™ from the Goddard Space Flight Center and participating members of the 
International Laser Ranging Service. Precision orbit determination was 
accomplished using the NASA/Goddard Space Flight Center's GEODYN system 
of programs” using the GLGM-3 gravity model”? as a reference. GEODYN 
numerically integrates the spacecraft Cartesian state and force-model partial 
derivatives by employing a high-order Cowell predictor-corrector model. In addi- 
tion to a model of the lunar gravity field, the force modelling included point mass 
representations for the Sun and planets. Solar radiation pressure, measurement 
and timing biases, and tracking station coordinates were also estimated. 

Radial range errors were minimized by geometric adjustment of altimetric 
tracks within the study area. In practice, profiles of one-way range from the 
LOLA instrument to the lunar surface along the spacecraft ground track were 
converted to lunar radius at each bounce point using the reconstructed orbit of 
LRO. Topography (Fig. 1a) was determined by subtracting a sphere of 1,737.4km 
from each radius measurement. Topographic measurements were binned and 
interpolated within a 40 km X 40 km area at 10-m spatial resolution. 


The LOLA digital elevation model (DEM) used in this analysis was combined 
with a lunar ephemeris*' to characterize the solar illumination conditions of 
Shackleton and surroundings (Fig. 1b). A polar gnomonic projection on which 
great circle paths plot as straight lines was applied to the DEM to calculate lighting 
conditions that are accurate over geological timescales. 

The confluence of orbit ground tracks in the vicinity of the lunar poles, com- 
bined with LOLA’s multi-beam profiling capability, permitted slopes over a range 
of baselines and directions to be determined. In this study, slopes (Fig. 1c) were 
calculated from a two-laser-spot fit at 20-50-m length scales. In similar fashion, 
RMS roughness (Fig. 1d) was calculated as a standard deviation about a plane 
fitted by least-squares to two laser shots along-track, from which at least four valid 
spots were returned out of a possible total of ten. The plane had dimensions of 
90-100 m in the longest axis and 10-40 m in the shortest axis, depending on the 
positions of the spots returned. An independent measure of surface roughness, 
discussed in Supplementary Information and shown in Supplementary Fig. 2, used 
the spreading in time of backscattered pulses. The spreading of LOLA’s 
backscattered pulses provides a measure of the RMS roughness of the surface at 
a smaller scale—the 5-m diameter of the laser footprints on the lunar surface. 
As with plane-deviation roughness shown in Fig. 1d, the pulse-spread-derived 
roughness featured a correction for local slopes. 

The ratio of returned to transmitted pulse energy is a measure of surface 
reflectance at the laser wavelength of 1,064nm. The transmitted and returned 
pulse energies were measured by integrating the area under the pulses. LOLA’s 
measurement of reflectance is calibrated only in a relative sense, with respect to 
pre-launch testing, as the instrument lacks a source with known brightness in 
flight. Observations of reflectance were derived from LOLA tracks crossing 
Shackleton and environs from day of year 130-149 2010, which represented the 
most favourable time period for stable reflectance measurements due to the 
geometry of spacecraft terminator crossings. 
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Electronic nematicity, a unidirectional self-organized state that 
breaks the rotational symmetry of the underlying lattice’’, has 
been observed in the iron pnictide*’ and copper oxide*™' high- 
temperature superconductors. Whether nematicity plays an 
equally important role in these two systems is highly controversial. 
In iron pnictides, the nematicity has usually been associated with 
the tetragonal-to-orthorhombic structural transition at temper- 
ature T,. Although recent experiments*” have provided hints of 
nematicity, they were performed either in the low-temperature 
orthorhombic phase** or in the tetragonal phase under uniaxial 
strain*®’, both of which break the 90° rotational C, symmetry. 
Therefore, the question remains open whether the nematicity can 
exist above T, without an external driving force. Here we report 
magnetic torque measurements of the isovalent-doping system 
BaFe,(As,_,P,)2, showing that the nematicity develops well above 
T, and, moreover, persists to the non-magnetic superconducting 
regime, resulting in a phase diagram similar to the pseudogap 
phase diagram of the copper oxides*'”. By combining these results 
with synchrotron X-ray measurements, we identify two distinct 
temperatures—one at T*, signifying a true nematic transition, 
and the other at T, (<T*), which we show not to be a true phase 
transition, but rather what we refer to as a ‘meta-nematic trans- 
ition’, in analogy to the well-known meta-magnetic transition in 
the theory of magnetism. 

Magnetic torque measurements provide a stringent test of nematicity 
for systems with tetragonal symmetry"*. The torque t = [ig VM X Hisa 
thermodynamic quantity, a differential of the free energy with respect to 
angular displacement. Here [ip is the permeability of vacuum, V is the 
sample volume, and M is the magnetization induced in the magnetic 
field H. When H is rotated within the tetragonal a—b plane (Fig. 1a, b), t 
is a periodic function of 2, where ¢ is the azimuthal angle measured 
from the a axis: 


1 . 
T2h = 5 MoH V[ (aa — X00) sin 26 — 2X4 cos 2] (1) 


where the susceptibility tensor 7; is defined by M; = 2jy;;Hj. Ina system 
maintaining tetragonal symmetry, T24 should be zero, because Yaa = Xbb 
and 7,» = 0. Finite values of t24 appear if a new electronic or magnetic 
state emerges that breaks the C, tetragonal symmetry. In such a case, 
rotational symmetry breaking is revealed by Yaa ~ Yop and/or Yap ~ 0, 
depending on the direction of the nematicity. 

BaFe,(As,_,P,)2 is a prototypical family of iron pnictides'*"*, whose 
phase diagram is displayed in Fig. 1c. The temperature evolution of the 
torque t(#) for the optimally doped compound (x = 0.33) is depicted in 
the upper panels of Fig. 1d. The two- and four-fold oscillations, t24 and 
T4p, Obtained from the Fourier analysis are shown respectively in the 
middle and lower panels of Fig. 1d. The distinct two-fold oscillations 
appear at low temperatures, whereas they are absent at high temperatures 
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Figure 1 | Torque magnetometry and the doping-temperature phase 
diagram of BaFe,(As;_,P,)2. a, b, Schematic representations of the 
experimental configuration for torque measurements under in-plane field 
rotation. In a nematic state, domain formation with different preferred 
directions in the a—b plane (‘twinning’) will occur. We used very small single 
crystals with typical size ~70 um X 70 jumX 30 um, in which a significant 
difference in volume between the two types of domains enables the observation 
of uncompensated 14 signals. The equation given in the figure for t assumes 
unit volume; see text for details. A single-crystalline sample (brown block) is 
mounted on the piezo-resistive lever which is attached to the base (blue block) 
and forms an electrical bridge circuit (orange lines) with the neighbouring 
reference lever. A magnetic field H can be rotated relative to the sample, as 
illustrated by a blue arrow on a sphere. In this experiment, the field is precisely 
applied in the a-b plane. c, Phase diagram of BaFe2(As,_,P,)2. This system is 
clean and homogeneous'*’*"’, as demonstrated by the quantum oscillations 
observed over a wide x range’’. The antiferromagnetic transition at Ty (filled 
circles)’ coincides or is preceded by the structural transition at T, (open 
triangles)'*. The superconducting dome extends over a doping range 
0.2 <x <0.7 (open squares), with maximum T, = 31 K. Crosses indicate the 
nematic transition temperature T* determined by the torque and synchrotron 
X-ray diffraction measurements. The insets illustrate the tetragonal FeAs/P 
layer. Yq» = 0 above T* yielding an isotropic torque signal (green-shaded 
circle), whereas 7,1, # 0 below T™, indicating the appearance of the nematicity 
along the [110] (Fe-Fe bond) direction, illustrated with the green-shaded 
ellipse. d, The upper panels depict the temperature evolution of the raw torque 
1(#) at oH = 4T for BaFe2(Aso67Po.33)2 (Tc = 30K). All torque curves are 
reversible with respect to the field rotation. t(¢) can be decomposed as t 
(b) = tog + Tap + Top + ***, Where Tang = Ang Sin 2n(~ — ¢o) has 2n-fold 
symmetry with integer n. The middle and lower panels display the two- and 
four-fold components obtained from Fourier analysis. The four-fold 
oscillations t4g (and higher-order terms) arise primarily from the nonlinear 
susceptibilities’. a.u., arbitrary units. 
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(middle panels of Fig. 1d). As shown in the upper panel of Fig. 2a, the 
amplitude of the two-fold oscillation |A2,| is nearly zero at high tem- 
peratures, and grows rapidly below T* ~ 85K. These results clearly 
indicate that the tetragonal Cy symmetry, which is preserved at high 
temperatures, is broken below T*, demonstrating the formation of the 
electronic nematic phase at T*. The two-fold oscillations below T* 
follow the functional form 124 = A2scos2¢, meaning that Yaa = Zon 
and 7%,» 70, which indicates the nematicity along the tetragonal 
[110] direction, that is, the Fe-Fe bond direction (Fig. lc, inset). 
Anomalies at T* can also be seen in the synchrotron X-ray diffraction 
measurements (middle panel of Fig. 2a), in which the full-width at 
half-maximum (FWHM) of the high-order Bragg peaks at T< T* 
(Fig. 2a, red circles) grows more quickly as temperature is reduced 
than does the linear extrapolation from above T*, and is accompanied 
by the suppression of the peak intensity (green circles). This indicates a 
broadening (or small splitting) of the Bragg peak below T*, implying 
domain formation due to the nematicity which, to some extent, couples 
to the orthorhombic lattice distortion (as discussed later). 

Figure 2b-e shows the results for the lower-concentration samples 
(x = 0.27, 0.23, 0.14, 0) exhibiting both the tetragonal-to-orthorhombic 
and magnetic transitions. In these crystals, |A2,| is finite even at 200 K 
(grey circles in upper panels), and initially increases with decreasing 
temperature, exhibiting a cusp-like peak at T*. We attribute T* to the 
electronic nematic transition temperature for the following reasons. 
Similarly to the optimally doped compound, the X-ray FWHM and 
intensity change their slope at T* (middle panels in Fig. 2b-e). 
Moreover, the two-fold term in these compounds can be clearly 
separated into two components (see Supplementary Information) as 
Tp =AZy” Cos2p+ASy sin 2(~ —Pext), in which ex is temperature 
independent and A5¥' has a smooth temperature dependence with the 
phase ¢.x, that is sample dependent. This demonstrates that the 
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observed two-fold oscillations contain the [110] nematic term 
Agy COs2p in addition to some other, extrinsic two-fold term. The 
obtained nematic amplitude Ax (D) appears below T* (blue circles 
in upper panels); this behaviour is well reproduced for different crystals 
of the same composition (Fig. 2b and Supplementary Fig. 3). The origin 
of the non-zero two-fold signal at temperatures above T™* is not clear, 
but the sample-dependent phase ¢.,: points to the possibility of 
impurity-induced in-plane magnetic anisotropy’. In contrast to 
previous experiments, the present measurements have been performed 
in the absence of uniaxial stress, providing thermodynamic evidence 
for the electronic nematic transition at T*, well above the structural 
lattice transition temperature T,. The continuous doping dependence 
of T*(x) displayed in Fig. 1c indicates that nematicity persists over a 
wide range of doping, covering the non-magnetic superconducting 
regime. 

Clearly, there cannot be two nematic phase transitions at both T, 
and T*, because the C, rotational symmetry can only be broken once. 
Our measurements show that the temperature T* (>T,) marks the 
onset of the true phase transition, accompanied by the nematic two- 
fold torque component Az # 0. This raises the question as to what 
happens at the structural transition temperature, T,. This question can 
be answered straightforwardly if one considers the Landau free energy 
expansion in terms of two order parameters, the orthorhombic lattice 
distortion 6=(a—b)/(a+b) and the phenomenological electronic 
nematic parameter (proportional to the measured torque amplitude 
Arg), which can be written as follows: 


F[5,/] = [t,6° — ud* + vd°] + [thy +wy*+O(W°)|—gwd (2) 


with the terms in the first set of square brackets on the right-hand side 
describing the first-order structural phase transition and the second 
bracket responsible for the (second-order) nematic phase transition. 


a b 
x =0.33 "x = 0.27 
> H Ts) o oH 
cite: . 2H 
s ey! H 
a e; ® | 
a t be 
aa ca at  — 
~oteb . 4 Le, 7! Is 
So |@re,! 78 at i? | | 
Zz 4 'e J 03+ “®t 70803 6 
Sowers |. mel | 11%, 
oe 1 8 | ib 10.6 i “#4 
ft wha “. : |, 024 Pee,” “asset? 
im ‘eis a 
ro a 
1 ae 
a 4b eet ny 1 
=} 4 r= J 
3 “Shee Fee 
3 ' ' 
< a | 
' 
t t 
0 1 


us 
700 200 
T(K) 


300 «OO 


Oo 
f=} 
oO 


T(K) 


Figure 2 | Nematic and meta-nematic transitions. Temperature dependence 
of the two-fold oscillation amplitude |A.4| of the torque (blue circles in upper 
panels), the FWHM and peak intensity (red and green circles in middle panels, 
respectively) of the synchrotron X-ray Bragg reflection, the in-plane resistivity 
p (black lines in lower panels), and the torque amplitude for polar-angle 
rotation (pink circles in lower panels) are shown for x values of 0.33 (a), 0.27 
(b), 0.23 (c), 0.14 (d) and 0 (e). The nematic transition temperature T* and the 
meta-nematic transition temperature T, are defined by red and grey vertical 
dashed lines, respectively, as well as by arrows in the upper panels. For 
underdoped (x = 0.27) and parent (x = 0) crystals, the intrinsic nematic signal 
for |Az4| is extracted from the raw data (grey circles in upper panels) with 
subtraction of the smooth background of the two-fold oscillations, 
Aog™'sin2(~ — Pext), above T* (Supplementary Fig. 2). For x = 0.27, the sample 


300 


dependence is also shown. The X-ray data have been analysed for several Bragg 
peaks in the following tetragonal directions of the reciprocal space: [10,0,0]+ 
(a), [7,7,0]r (b, e) or [8,8,0] 7 (¢, d). For x = 0.23 a clear splitting of the Bragg 
peaks is observed below T, (Supplementary Fig. 5). The dotted lines are guides 
to the eye. For x = 0, 0.27 and 0.33, the same crystal has been used for the torque 
and X-ray measurements. We find no clear anomaly in the resistivity p(T) and 
the torque signal |A2,| for polar-angle rotation (which would detect the a-c 
anisotropy of susceptibility; lower panel), indicating that the anomalies are 
associated only with the in-plane nematicity. The magnitude of | Ay4| has only a 
weak field dependence (Supplementary Fig. 4), implying that the nematic 
anisotropy prevails in the zero-field limit, which is consistent with the X-ray 
results taken at zero field. 
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Here u, v and w are phenomenological Landau coefficients describ- 
ing these transitions, and g is the coupling strength between the 
two order parameters. The temperature-dependent coefficients 


t= (7-1!) /1 and t)= (r-1)) [1 were chosen such 


that in the absence of the coupling between the two order parameters, 


the structural transition occurs at lower temperature 7.” ( < ip). 


Thanks to the linear coupling between the two order parameters, as 
expressed in the last term in equation (2), both —y and 6 develop non- 
zero values below the nematic transition temperature T* (Fig. 3a, b). 
On the other hand, T, ceases to be a true phase transition, since the Cy 
symmetry is already broken on either side of T,, and the lattice distor- 
tion 6 is non-zero over the entire temperature range (Fig. 3a). Instead, 
both 6 and w undergo a finite jump at T,, as illustrated in Fig. 3. We call 
this a ‘meta-nematic transition’, in analogy to the meta-magnetic 
transition in the theory of magnetism, where the magnetization 
undergoes a jump as a function of temperature or applied magnetic 
field, but remains non-zero on both sides of the transition. The analysis 
of the free energy below 7* shows that it exhibits a maximum at y = 0 
anda single minimum at finite i, as in the second-order Landau phase 
transition (Supplementary Fig. 6). 

To quantify the lattice distortion 6 experimentally, we have analysed 
the X-ray data using a two-peak fitting procedure (Supplementary Fig. 5), 
which reveals that the data in the region T, < T < T* can be fitted with 
very small but finite 6. The results, 6(T), can be well reproduced within 
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Figure 3 | Temperature dependence of the nematic order parameter and 
lattice distortion. a, Lattice distortion 6 (= (a — b)/(a + b)) and b, the nematic 
order parameter , proportional to measured A,4 component of the torque in 
the paramagnetic temperature region, are fitted to the theory (lines) based on the 
Landau free energy expansion (equation (2)) using the same set of parameters 
(Supplementary Information). For the parent compound (x = 0, red): u = 3.28, 
v= 4.08, w= 4.52, ¢= 1.91, TY?) =52 K, TO =117 K; for x = 0.14 (green): 

4= 9.48, v= 314, w=315,g=2.72, Te =35 K, Tf) = 79K: for x = 0.23 
(black): « = 8.57, v= 27.3, w= 224, ¢= 2.34, TO =26 K, 7.) =79 Ks and for 
x = 0.27 (blue): u = 4.37, v = 10.07, w = 42.58, g = 3.11, T! =24 K, 

7 =56 K. Note that because of the coupling between the order parameters 
both T, and T* get renormalized compared to their initial values T.” and 7), 0) 
The experimental errors in a (error bars, s.e.m.) are estimated in the two- speak 
fitting procedure from the width of the Bragg peaks (Supplementary 
Information). In the antiferromagnetic phase below Ty, the amplitude A2,(T) 
shows deviations from y/(T) (dashed lines in b) for x S 0.14, where effects of the 
antiferromagnetic moment on torque may be large. 
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the framework of equation (2), as Fig. 3a demonstrates. Moreover, the 
same set of Landau parameters also fits the temperature dependence of 
the nematic order parameter y x A, well, except below the Néel tem- 
perature Ty, where additional effects of antiferromagnetism may enter 
the magnetic torque anisotropy (Fig. 3b). We have thus established 
that the true thermodynamic transition occurs at T= T*, and is 
accompanied by the development of non-zero values of both the 
nematic order parameter if and the lattice distortion 6. We note that 
similarly small but non-zero values of 6 have been recently reported” 
in the powder diffraction measurements of SmFeAs(O,_,F,). 

Under applied uniaxial strain, the lattice distortion 6 is expected to 
develop a non-zero value even above the tetragonal-to-orthorhombic 
structural transition temperature, and by virtue of linear coupling to 
the nematic order parameter in equation (2), this will lead to an 
increase in the value of the electronic anisotropy w. This reasoning 
is consistent with the observation of resistivity anisotropy in the 
detwinned samples above the structural transition®” 

We note that the above explanation in terms of coupling between the 
electronic and lattice degrees of freedom is very generic, and does not 
depend on the precise microscopic nature of the nematic order para- 
meter , be it caused by Z, spin-nematic ordering’, or by orbital 
ordering’. In fact, both mechanisms can cooperate to break the Cy 
symmetry spontaneously. In the spin-nematic scenario, the instability is 
driven by thermal spin fluctuations above the antiferromagnetically 
ordered phase”, potentially applicable to the x < 0.30 regime in 
BaFe,(As,—,P,.)2 (ref. 15). In the absence of long-range magnetic order- 
ing (x > 0.30), quantum spin fluctuations can in principle still drive the 
instability; however, the fact that the nematic transition at T* occurs 
even for superconducting samples far away from the antiferromagnetic 
phase (see Fig. 1c) indicates that a different mechanism may be at 
play. In particular, orbital ordering may turn out to be more important, 
where the electronic nematic transition naturally occurs as a result of 
orbital polarization between the Fe d,., and d,, orbitals”, w ox (yz — Nyz), 
where n,., (n)z) denotes the occupation of the iron d,, (d,,) orbital. This 
mechanism of nematicity is supported by the recent angle-resolved 
photoemission spectroscopy (ARPES)* and quadrupolar resonance 
measurements”. 

There is a growing body of evidence that entanglement of the spin 
and orbital degrees of freedom leads to emergent novel electronic 
phases in the iron pnictides. The present temperature-doping phase 
diagram bears a resemblance to that of the high-transition-temperature 
(high-T) copper oxides, in that the suppression of the antiferromagnetic 
ground state leads to the emergence of high-T. superconductivity, 
and the electronic nematic instability occurs well above the magnetic 
and superconducting transitions. Recent infrared studies of charge 
dynamics report the formation of a pseudogap in the excitation 
spectrum of optimally doped BaFe,(As,_,P,). below ~100K (S. J. 
Moon et al., unpublished results). Further studies are therefore needed 
to clarify how the nematic transition we report here is related to the 
pseudogap formation in the iron pnictides. 
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Multiscale gigapixel photography 


D. J. Brady', M. E. Gehm?, R. A. Stack?, D. L. Marks!, D. S. Kittle, D. R. Golish’, E. M. Vera? & S. D. Feller! 


Pixel count is the ratio of the solid angle within a camera’s field of 
view to the solid angle covered by a single detector element. Because 
the size of the smallest resolvable pixel is proportional to aperture 
diameter and the maximum field of view is scale independent, the 
diffraction-limited pixel count is proportional to aperture area. At 
present, digital cameras operate near the fundamental limit of 
1-10 megapixels for millimetre-scale apertures, but few approach 
the corresponding limits of 1-100 gigapixels for centimetre-scale 
apertures. Barriers to high-pixel-count imaging include scale- 
dependent geometric aberrations, the cost and complexity of 
gigapixel sensor arrays, and the computational and communica- 
tions challenge of gigapixel image management. Here we describe 
the AWARE-2 camera, which uses a 16-mm entrance aperture to 
capture snapshot, one-gigapixel images at three frames per minute. 
AWARE-2 uses a parallel array of microcameras to reduce the 
problems of gigapixel imaging to those of megapixel imaging, which 
are more tractable. In cameras of conventional design, lens speed 
and field of view decrease as lens scale increases', but with the 
experimental system described here we confirm previous theoretical 
results’® suggesting that lens speed and field of view can be 
scale independent in microcamera-based imagers resolving up to 
50 gigapixels. Ubiquitous gigapixel cameras may transform the 
central challenge of photography from the question of where to 
point the camera to that of how to mine the data. 

AWARE-2 is a monocentric, multiscale camera with 120°-by-50° 
field of view (FOV) and a 38-prad instantaneous FOV (the angular 
extent ofa single pixel). A monocentric, multiscale camera consists of a 
spherically symmetric objective lens’ surrounded by an array of 
secondary microcameras***°. AWARE-2 includes 98 microcameras, 
each with a 14-megapixel sensor. It was constructed as part of the US 
Defense Advanced Research Projects Agency AWARE programme, 
which focuses on creating a microcamera platform for scalable, 1-100- 
gigapixel cameras. Just as the microprocessor is a platform for scalable 
parallel computing, the microcamera is a platform for scalable parallel 
cameras with diverse applications in wide-field microscopy, event 
capture, persistent surveillance and space awareness. As with micro- 
processors, the designer of microcameras must address granularity 
and performance trade-offs in selecting aperture and focal plane size, 
materials and components, and interconnection and processing 
architecture. Details of the AWARE-2 system design are presented 
in Supplementary Information, sections 2 and 3. In this Letter, we 
compare AWARE-2 with previous gigapixel-scale imaging systems, 
illustrate the capture of a large-scale dynamic event, demonstrate the 
capacity for high-dynamic-range (HDR) imaging in microcamera 
systems and analyse the camera’s optical resolution. 

A gigapixel camera requires a lens system capable of resolving more 
than 10° elements and detectors containing more than 10” elements. 
Designs that address these challenges may be segmented into terrestrial 
cameras with horizontal 90-120° FOVs (AWARE and Asymmagon; 
see http://www.gigapxLorg/), airborne surveillance cameras with 
cylindrical 60-70° FOVs (ARGUS-IS”® and the multilens array’’) 
and astronomical systems with cylindrical 3-4° FOVs (LSST’* and 
Pan-STARRS"’). Design metrics for these systems are provided in 


Supplementary Table 1.1. Lens design strategies differ between arrays 
of narrow-field cameras (the multilens array) and single objectives 
with curved focal planes. Although arbitrarily large pixel counts may 
be obtained using arrays of conventional cameras, design comparisons 
confirm, as predicted’, that the volume of a system with a flat focal 
plane scales much faster than does that of a design with a curved focal 
plane. Multiscale design captures the advantage of camera arrays 
(off-the-shelf focal plane arrays) while avoiding the disadvantage (cost 
and volume of multiple objective lenses). AWARE uses microcamera 
arrays to create a large virtual-focal-plane array, in place of precision- 
mosaicked sensor arrays as used in ARGUS, LSST and Pan-STARRS. 
The most important points of comparison are that the AWARE design 
is scalable to larger pixel counts, as illustrated by the AWARE-40 
design’, and that AWARE provides operational advantages over the 
comparison systems because the focus, gain and exposure of each 
microcamera can be independently controlled. The disadvantage of 
the AWARE approach is that a stop must be introduced in the micro- 
camera to balance relative illumination and modulation transfer, 
thereby increasing system f number (the ratio of lens focal length to 
effective aperture, which is a measure of lens speed) and volume*”. Bare 
mosaicked arrays may be preferred for fixed-focus, cost-insensitive 
airborne and astronomical systems. Multiscale design, however, 
uniquely makes possible compact, low-cost, terrestrial imaging 
systems focused at finite range. 

The potential for novel science using AWARE-2 is illustrated in 
Fig. 1, which is a gigapixel snapshot of tundra swans on Pungo Lake 
in the Pocosin Lakes National Wildlife Refuge, USA. Microcamera 
data was registered and stitched onto a rectangular grid with 38-1rad 
instantaneous FOV, to produce a 0.96-gigapixel version of Fig. 1. Non- 
uniformity correction and logarithmic scaling were applied to improve 
visual display. Figure 1 is downsampled to 4 megapixels for publica- 
tion. Raw microcamera images of Fig. 1 regions a, b, c, d and e are 
shown in Fig. 2. The gigapixel snapshot provides information, such as 
exactly how many swans are on Pungo Lake (656) or in the air above it 
(27) at the instant the snapshot was taken, that would be unobtainable 
using a scanned panoramic camera shooting from the same position. 
Also, the image and image sequences can be mined to analyse signal- 
ling behaviour across the flock and to track individual birds. AWARE- 
2 microcamera control modules use custom electronic modules 
(Supplementary Information, section 3) to buffer approximately ten 
image frames locally and to capture asynchronously and transfer indi- 
vidual microcamera images at up to ten frames per second. 

An example HDR image is shown in Fig. 3. The brightness of 
this scene varies from regions of fully sunlit flat surfaces and bright 
sky to areas of deep shadow. An auto-exposure algorithm adjusts the 
exposure time for each individual microcamera to maximize the usage 
of the 8-bit dynamic range in each sensor. When the images are 
composited into the final scene, knowledge of the exposure setting is 
combined with the measured pixel values to estimate the source 
radiance with a global 32-bit dynamic range. Displays are typically 
limited to 8-bit output, so this HDR result must be mapped onto this 
smaller dynamic range. Figure 3 was generated by tone-mapping the 
HDR image. 
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Figure 1 | Pungo Lake as captured using AWARE-2. The total FOV is 120° 
by 50° and the composite image here consists of 0.96 gigapixels, where each 
pixel has an instantaneous FOV of 38 prad. The measured values are 
logarithmically mapped to make better use of the display dynamic range. The 


a b 


Qa 


Figure 2 | Details of Fig. 1. a-e, Labelled regions in Fig. 1. The swans in c are 
114 pixels long and are 310-350 m from AWARE-2. Each pixel corresponds to 
13 mm at the position of the swans. In d, the most distant bird is 17 pixels long 
and the closest is 70 pixels long. The limited depth of focus of the camera is 
illustrated in e, where regions of the foreground foliage are in sharpest focus. 


relative responsivities of the individual microcameras were adjusted iteratively 
to minimize variation across sub-image boundaries. The labelled regions are 
referred to in Fig. 2. 


Tone mapping creates an individual 32-bit/8-bit conversion for 
each pixel in the scene, but ensures that the mappings vary smoothly 
from pixel to pixel’*. The majority of the display dynamic range is used 
on shadows and highlights, with mid-tones compressed. The tone- 
mapped image more accurately matches human visual processing 
because our vision is foveated and our pupils adjust as we examine 
different regions of a wide field. More details on the compositing 
process are provided in Supplementary Information. 

The maximum pixel count for an imager with aperture diameter A 
and the mean operating wavelength 1 is S = mA*sin°(FOV/2)//° 
(ref. 2). For AWARE-2, FOV = 120° and A = 16 mm, corresponding 
to a limit of two gigapixels at an operating wavelength of 550 nm. At its 
design capacity of 220 microcameras, a fully populated AWARE-2 
would capture three gigapixels but the estimated field would decrease 
to two gigapixels after sensor regions with limited illumination had 
been removed and overlapping regions had been merged. AWARE-2’s 
resolution is illustrated by the star field shown in Fig. 4, which shows 
details of a gigapixel sky survey with an exposure time of 1.85 s. Faint 
stars in this image illuminate two to four pixels after median filtering to 
remove hot pixels and logarithmic intensity mapping to fill the display 
dynamic range. We anticipate that higher-resolution multiscale 
systems with terrestrial motion compensation and microcamera 
adaptive optics to remove atmospheric blur may be developed for 
space situational awareness. 

As shown in systematic resolution images and modulation transfer 
function measurements presented in Supplementary Information, 
section 4b, most of the blur in the star field image is due to defects 
in AWARE-2’s microcamera lenses. To allow low-cost gigapixel array 
integration, AWARE-2 uses injection-moulded plastic relay optics. 
These lenses may be moulded with aspheric surfaces, requiring fewer 
elements and consequently less volume and mass than a spherical glass- 
element camera with similar performance. However, birefringence in 
the fabricated optics due to residual stresses introduced during mould- 
ing degrades AWARE-2’s image quality and resolution relative to fun- 
damental limits. We expect newer high-index plastics that minimize 
birefringence to enable the camera to approach these theoretical limits. 
We have also built glass microcamera lenses and, as shown in 
Supplementary Information, section 4c, AWARE-2 achieves pixel- 
limited optical resolution with these lenses. 
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Figure 3 | Traffic circle captured using AWARE-2. Insets are digitally 
magnified by a factor of 13. Distances to the inset regions range from 15 m (‘no 
parking’ sign; first from left) to 92 m (detail of building; third from left). The 
exposure time for each microcamera was set independently of the others, and a 


AWARE-2 demonstrates that the age of ever-increasing pixel 
count is far from over. Although development of high-performance, 
low-cost microcamera optics and optomechanics have been the main 
challenge in the present stage of multiscale camera development, 
integrated circuits, rather than optics, remain the primary barrier 
to ubiquitous high-pixel-count imaging. To accommodate the 
electronics and allow for heat dissipation (the camera expends 
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tone-mapping algorithm was used to convert the resulting HDR image for 
display. Global distortion associated with mapping the 120° horizontal field 
onto a flat image is apparent. 


430W during image acquisition), AWARE-2 is mounted in a 
0.75mX 0.75m xX 0.5m frame. The optical system occupies less 
than 3% of the system volume. The size of the camera is dictated 
both by the size of the electronic control boards and the need to cool 
them effectively. As more efficient and compact electronics are 
developed, hand-held gigapixel photography may become an everyday 
reality. 
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Figure 4 | Details of a star field captured using AWARE-2 with a 1.85-s 
exposure time. Image data was logarithmically mapped to display values to 
make better use of the available display dynamic range. Stars with apparent 
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magnitudes of m < 8.2 mag are visible in the image. However, those with 
m = 3.5 mag saturate the detector at this exposure time. 


©2012 Macmillan Publishers Limited. All rights reserved 


METHODS SUMMARY 


In Figs 1 and 2, all microcameras were set to focus at infinity and had a fixed 
exposure time of 232 1s. The image was captured on 5 December 2011 at 10:43. 
Figure 3 was captured using independent auto-exposure and auto-focus in each 
microcamera. The image shows the Fitzpatrick Center at Duke University on 18 
January 2012. In Fig. 4, all microcameras were set to focus at infinity and had a 
fixed exposure time of 1.85 s. The camera was pointed at a right ascension of 4h 
59 min 2s and a declination of 37° 05’ 45’’ and was located at 36° 00’ 50.24'’ N, 
79° 00' 13.38’' W. Cloud cover was near zero. The typical sky brightness at this 
location is 19.50-18.38 mag arcsec ~ in the V band (Bortle scale of approximately 
6). The image was captured on 15 January 2012 at 21:44:54 local time. 

Auto-exposure initialization requires several seconds and updates in every third 
frame thereafter. The algorithm steps exposure up and down in logarithmically 
spaced increments until 1-3% of the pixels are saturated. After exposure is set, the 
focus motor is stepped in fixed increments. The focus metric is the soft- 
thresholded sum of the absolute value of the horizontal and vertical gradients 
independently evaluated over a region of interest, typically a 1,024 X 1,024-pixel 
image centre. Soft-thresholding sets gradient values below a fixed bias to zero, thus 
emphasizing sharper features. The step direction reverses when the metric at the 
present position exceeds the metric at the last position. Focal adjustment stops 
when reverses occur in three subsequent steps or when eight reverses occur in 
total. Focus resumes if the metric exceeds 30% of the stored minimum value. If the 
number of saturated pixels leaves the 1-3% target range, auto-exposure and focus 
reset. 

AWARE-2 was pointed at clear sky with a diffuse plastic dome over the gigagon 
lens location, to calibrate each pixel’s gain and illumination. In all figures shown, 
variable pixel gain and illumination was removed by ‘flat-field correction’. 
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First dairying in green Saharan Africa in the fifth 


millennium BC 


Julie Dunne’, Richard P. Evershed', Mélanie Salque’, Lucy Cramp’, Silvia Bruni’, Kathleen Ryan’, Stefano Biagetti* 


& Savino di Lernia*® 


In the prehistoric green Sahara of Holocene North Africa—in con- 
trast to the Neolithic of Europe and Eurasia—a reliance on cattle, 
sheep and goats emerged as a stable and widespread way of life, 
long before the first evidence for domesticated plants or settled 
village farming communities’*. The remarkable rock art found 
widely across the region depicts cattle herding among early 
Saharan pastoral groups, and includes rare scenes of milking; 
however, these images can rarely be reliably dated*. Although the 
faunal evidence provides further confirmation of the importance 
of cattle and other domesticates’, the scarcity of cattle bones makes 
it impossible to ascertain herd structures via kill-off patterns, 
thereby precluding interpretations of whether dairying was prac- 
ticed. Because pottery production begins early in northern Africa® 
the potential exists to investigate diet and subsistence practices 
using molecular and isotopic analyses of absorbed food residues’. 
This approach has been successful in determining the chronology 
of dairying beginning in the ‘Fertile Crescent’ of the Near East and 
its spread across Europe*”'. Here we report the first unequivocal 
chemical evidence, based on the 5'°C and A!°C values of the major 
alkanoic acids of milk fat, for the adoption of dairying practices by 
prehistoric Saharan African people in the fifth millennium Bc. 
Interpretations are supported by a new database of modern rumin- 
ant animal fats collected from Africa. These findings confirm the 
importance of ‘lifetime products’, such as milk, in early Saharan 
pastoralism, and provide an evolutionary context for the emer- 
gence of lactase persistence in Africa. 

It is widely accepted that African pastoralism with cattle, sheep and 
goats emerged long before plant domestication’, in contrast to the 
process of ‘neolithization’ in the Near East, characterized by the trans- 
ition from a mobile hunter-gatherer lifestyle to an increasingly settled, 
agricultural way of life. In Saharan Africa, during the Early Holocene, 
largely sedentary and pottery-producing hunters, fishers and gatherers 
became nomadic cattle herders’, dynamically adapting to, and exploit- 
ing, different environments and resources. 

Today, it seems impossible that cattle could survive in such a hostile 
environment as the arid desert land of the Sahara, but this region 
enjoyed vastly more favourable climatic and environmental condi- 
tions’? during the Holocene African Humid Period, which began 
around 10,000 years ago”. Here, faunal evidence demonstrates that 
by the early sixth millennium Bc, cattle, sheep and goats were found 
together across the savannas of what is now the Sahara’. This suggests 
that the inception of dairying practices in North Africa and an early 
and independent ‘secondary products’ economy™ seems plausible 
given what we now know of the first appearance of milking in the 
Near East’. 

Compelling evidence of prehistoric cattle herding in northern Africa 
comes from the remarkable rock paintings and engravings of the 
Sahara (Fig. 1), possibly the world’s largest concentration of prehistoric 


art, long known for their rich and vivid portrayal of scenes from every- 
day life*’*"®. The extensive rock art demonstrates that cattle played an 
important part in the lives and ideology of ancient human groups living 
in this region during the Holocene. This pictorial record contains 
countless scenes with representations of cattle, some emphasizing 
the female’s full udders and, in a few cases, depictions of the actual 
milking ofa cow, such as at Wadi Teshuinat II'° in the Acacus or Wadi 
Tiksatin in the Messak!®. However, reliable dates for this rock art can 
rarely be ascertained’. 

Faunal remains from securely dated contexts indicate that domes- 
ticated animals (cattle, sheep or goats) were present in the area from 
the early sixth millennium Bc, becoming much more common in the 
fifth millennium Bc. Unfortunately, these remains are highly fragmen- 
ted and poorly preserved, precluding herd reconstructions, and thus 
even indirect evidence of dairying is missing’. 

Direct evidence for the practice of dairying, beginning in the seventh 
millennium Bc in northwestern Anatolia*, appearing in the sixth 
millennium Bc in eastern Europe” and reaching Britain in the fourth 
millennium Bc*’°, has been established through the compound- 
specific stable carbon isotope analysis of animal fat residues preserved 
in archaeological pottery. Notably, this research on the antiquity of 
dairying practices has largely been confined to Europe, the Near East 
and Eurasia, with no attempt yet being made to identify the inception 
of dairying practices in the African continent. 

Here we present direct chemical evidence for early dairying prac- 
tices within the central Sahara through the use of gas chromatography 
(GC), gas chromatography—mass spectrometry (GC-MS) and gas 
chromatography-combustion-isotope ratio mass spectrometry 
(GC-C-IRMS) analyses carried out on organic residues extracted 
from archaeological pottery sampled from the Takarkori rock shelter 
located in the southwest Fezzan, Libyan Sahara, an area licensed to 
Sapienza University of Rome (Supplementary Fig. 1). Four seasons of 
fieldwork identified evidence of Late Acacus (hunter-gatherer) occu- 
pation followed by Early, Middle and Late Pastoral remains (Sup- 
plementary Fig. 2), dating between approximately 8100 and 2600 Bc 
(Supplementary Table 1). This long Pastoral period, between approxi- 
mately 6000 and 2600 Bc, denotes the adoption of cattle together with 
sheep and goats, combined with intensive exploitation of wild cereals'”"*. 

Analyses of absorbed organic residues focused on 81 potsherds 
(Supplementary Table 2), covering a wide range of decorative tech- 
niques and motifs found on Saharan ceramics’*’? (Supplementary 
Fig. 3). These vessels were mainly excavated from securely dated 
Middle Pastoral (n = 56) levels (approximately 5200-3800 Bc), with a 
small number originating from the Late Acacus (nm = 8) and Early 
(n = 14) and Late Pastoral (nm = 3) periods. The lipids were extracted 
using established protocols*'°. Many potsherds demonstrated extra- 
ordinary preservation of lipids, containing concentrations of up to 
6mgg ' (mean 1.2mgg '), with one particular potsherd (TAK 


1Organic Geochemistry Unit, School of Chemistry, University of Bristol, Cantock’s Close, Bristol BS8 1TS, UK. *Dipartimento di Chimica Inorganica, Metallorganica e Analitica “Lamberto Malatesta”, 
Universita degli Studi di Milano - Via G. Venezian 21, 20133 Milano, Italy. African Section, University of Pennsylvania Museum of Archaeology and Anthropology, 3260 South Street, Philadelphia, 
Pennsylvania 19104-6324, USA. *Dipartimento di Scienze dell’Antichita, Sapienza, Universita di Roma, Via Palestro, 63 - 00185 Roma, Italy. °School of Geography, Archaeology & Environmental Sciences, 


University of the Witwatersrand, Johannesburg, Private Bag 3, Wits 2050, South Africa. 
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Figure 1 | Rock art image and tracing from Teshuinat II rock shelter, South West Libya. a, b, Rock art image (a) and tracing (b) showing Saharan pastoralists 


with their pots and cattle (adapted with permission from ref. 15). 


443) having a concentration of 17 mgg '. It is noteworthy that lipids 
were observed in every potsherd, in contrast to European archaeolo- 
gical sites, where generally <40% of potsherds contain extractable 
lipids with mean concentrations of approximately 0.1 mgg_' (refs 
10, 20). This remarkable preservation is likely to be related to the 
extremely arid conditions prevailing in the region. 

Lipid biomarker analyses by GC-MS showed that residues fall into 
three broad categories (Fig. 2). The most common distribution 
(Fig. 2a) was dominated by high abundances of the Ci¢.9 and Cis. 
fatty acids, which derive from degraded animal fats. Also abundant 
were branched-chain fatty acids, C3 to C,g, components of bacterial 
origin diagnostic of ruminant animal fats’'. The second most common 
type of residue (Fig. 2c) contained a relatively low abundance of the 
Cjg.9 alkanoic acid, with several extracts showing high abundances of 
Cj, and C,4 homologues. Such distributions have rarely been seen in 
European pottery and are more diagnostic of plant oils”. Also present, 
and again rarely seen in European prehistoric ceramics, are a homo- 
logous series of long-chain n-alkanes from C;¢—C33 (odd-over-even 
carbon number predominance), usually maximising at C25, regarded 
as originating from epicuticular waxes of vascular plants”. A third, 
intermediate category of residue (Fig. 2b) is characterized by a series of 
a,0-dicarboxylic acids (diacids), in the C; to Cig carbon-chain-length 
range (Fig. 2a, b) with C, (azelaic acid) the most abundant homologue, 
the latter commonly deriving from the ‘drying reaction’ of plant oils”. 
Such residues also contained long-chain alkyl lipids of plant origin. 
Together such mixtures probably reflect either processing of both 
plant and animal products in the vessels or the multi-use of vessels. 

Of the 81 potsherds, only those residues unambiguously assigned as 
degraded animal fats (Table 1)—that is, those dominated by palmitic 
(Cy6.9) and stearic (C;g.9) alkanoic acids (for example, Fig. 2a)—were 
selected for GC-C-IRMS analysis to determine the 8'°C values for 
Cy6.9 and Ci.0, with the aim of establishing their origins. Differences 
in the 5'°C values of Cy¢9 and Cigo alkanoic acids are due to the 


differential routing of dietary carbon and fatty acids during the syn- 
thesis of adipose and dairy fats in ruminant animals, thus allowing 
ruminant milk fatty acids to be distinguished from carcass fats by 
calculating A'°C values (8'°Cjg9-5'°Cy6,9) and plotting that against 
the 5'°C value of the C,¢.9 alkanoic acid. Previous research has shown 
that by plotting AC values, variations in C3 versus C, plant consump- 
tion are removed, thereby emphasizing biosynthetic and metabolic 
characteristics of the fat source”’®. We have now confirmed this 
through the GC-C-IRMS of a new reference collection of modern 
ruminant animal fats from Africa collected to encompass the range 
of carbon isoscapes” likely to have been encountered by early Saharan 
pastoralists. The 5'°C values of the Cy¢.9 and C,s.9 components of these 
modern fats, presented in Fig. 3, show 5'°C values of goat dairy fats 
from the Acacus region, Libya (n = 9), together with cattle dairy fats 
and cattle, sheep and goat adipose fats (n = 9, 12, 7 and 12, respect- 
ively) from Kenya. The 8'°Ci69 values for the Cy¢.9 alkanoic acids of 
the African reference fats plot in the range from —35 to —15%bo, indi- 
cating diets ranging from predominantly C; to Cy. These results con- 
firm the global applicability of the A’*C proxy. 

Of the 29 animal fat residues selected for GC-C-IRMS analyses, 22 
originate from Middle Pastoral levels, 3 from the Late Acacus, 2 from 
the Early Pastoral and the remaining 2 from the Late Pastoral period 
(Table 1). The comparison of the A'*C values of the modern reference 
animal fats with those of the archaeological pottery residues from the 
Middle Pastoral period (approximately 5200-3800 Bc) show that 50% 
of these plot within, or on the edge of, the isotopic ranges for dairy fats, 
with a further 33% falling within the range for ruminant adipose fats 
and the remainder corresponding to non-ruminant carcass fats 
(Fig. 3). Notably, the residues originating from earlier periods do not 
contain dairy fats, and plot in the non-ruminant fat range, probably 
deriving from wild fauna found locally. The unambiguous conclusion 
is that the appearance of dairy fats in pottery correlates with the 
more abundant presence of cattle bones in the cave deposits, suggesting 
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Table 1 | Subset of potsherds selected for isotopic analyses 
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Figure 2 | Partial gas chromatograms displaying the trimethylsilylated lipid 
extract from potsherds excavated from Middle Pastoral levels in the 
Takarkori rock shelter. a—c, The distributions are characteristic of degraded 
animal fat (a), a mixture of animal and plant fats (b) and plant material 

(c). Chromatographic peak identities denoted by filled triangles comprise 
straight-chain fatty acids in the carbon chain range Co.9 to Coo, maximizing at 
Cy6.0; filled squares represent n-alkanes in the carbon-chain range Cy9.9 to C35.03 
and filled circles indicate «,w-dicarboxylic acids (diacids) in the carbon-chain 
range Cs.9 to Cy¢.. IS, internal standard, C34 n-tetratriacontane. 


a full pastoral economy as the cattle were intensively exploited for their 
secondary products. 

Of particular note is the wide range of ‘°C values exhibited by the 
alkanoic acids, plotting across the range —25%0 to —10%bo, which is 
broader even than the reference fats range (maximum —15%bo). This 
suggests that the animals giving rise to these fats had subsisted on an 
extensive range of different forages either composed completely of C; 
plants, varying combinations of C; and C, plants, or a diet comprising 
wholly C, plants. The wide range of alkanoic acid ‘°C values found for 
these African potsherds is unprecedented and points to differing 
pastoral modes of subsistence (such as vertical transhumance, which 
is still practised today) by these prehistoric Saharan groups. This is 
supported by their settlement pattern based on summer sites in the 
lowland sand seas and winter sites (such as Takarkori) in the mountains’, 
which was probably in response to seasonal weather patterns. 

Our findings provide unequivocal evidence for extensive processing 
of dairy products in pottery vessels in the Libyan Sahara during the 
Middle Pastoral period (approximately 5200-3800 Bc), confirming 
that milk played an important part in the diet of these prehistoric 
pastoral people. The findings are notable for three other reasons: (1) 
they confirm that domesticated cattle, used as part of a dairying eco- 
nomy, were present in North Africa during the fifth millennium Bc, 
thus supporting the idea of an earlier ingression into the central 
Sahara’~ and suggesting a local process of pastoral development, based 
on the exploitation of secondary products; (2) the finding of dairy fat 
residues in pottery is consistent with milk being processed, thereby 


Potsherd Laboratory Period Wall decoration technique Diameter at Part of vessel Lipid 8 Cien 88 Cian AC Classification 
number code mouth (cm) concentration 
(ugg *) 
21 TAK21A iddle Pastoral Plain edge fishnet Not known Wall + base 5,830.6 —14.7 -20.5 -58 Dairy 
26 TAK1 iddle Pastoral APS return 20 Rim + wall 760.7 —14.2 -15.0 -0.9 Ruminant adipose 
45 TAK45 iddle Pastoral APS return Not known Wa 639.8 —21.9 —24.1 -—2.1 Ruminant adipose 
120 TAK120 iddle Pastoral APS return Not known Wa 5,592.7 “15.2 -18.7 =-3.5 Dairy 
124 TAK124 iddle Pastoral Ridged APS return Not known Wa 615.5 —18.1 -—20.1 -—2.0 Ruminant adipose 
197 TAK197 iddle Pastoral APS return Not known Wa 515 —20.9 —21.1 -0.2 Non-ruminant adipose 
420 TAK420 iddle Pastoral APS return, triangles Not known Wall 119.3 -183 -21.5 -3.2 Dairy 
443 TAK443 iddle Pastoral APS return Not known Wa 7,217.6 —16.9 —23.7 -68 Dairy 
576 TAK6 iddle Pastoral Ridged plain edge 30 Rim + wall 800.2 —22.0 -21.7. 03  Non-ruminant adipose 
748 TAK9 iddle Pastoral Ridged APS return 22 Rim + wall 5,650.5 -13./ -19.0 —52 Dairy 
824 TAK11 Late Pastoral Continuous plain edge and Not known Wall+ base 4,994.2 —20.5 -249 -44 Dairy 
impressed dashes 

873 TAK873 iddle Pastoral APS return Not known Wa 718 —18.5 -17.7. 08  Non-ruminant adipose 
896 TAK896 iddle Pastoral APS return Not known Wa 218.0 —23.6 —25.0 —1.5 Ruminant adipose 
987 TAK987 iddle Pastoral APS return Not known Rim + wal 4,442.6 -13.6. -19.3 —5.7 Dairy 
997 TAK15 iddle Pastoral APS return 16 Rim + wal 117.4 -13.3 -174 -4.1 Dairy 

009 TAK1009 Middle Pastoral APS paired lines, banded Not known Wall 55.7 —11.0 -11.0 0.0 Non-ruminant adipose 

012 TAK1012 Middle Pastoral Irregular APS return Not known Wall 3,591.2 -149 -16.5 —1.7  Ruminant adipose 

572 TAK1572 iddle Pastoral APS return Not known Wall 3,148.5 —23.7 -—28.2 -—4.5 Dairy 

693 TAK21 Late Acacus Undecorated Not known Rim + wal 20.1 —23.1 -19.8 3.3  Non-ruminant adipose 

797 TAK24 Early Pastoral Combined plain edge with 30 Rim + wal 674.6 -21.9 -—21.0 0.9  Non-ruminant adipose 

semilunar-motif impressed dots 

846 TAK25 iddle Pastoral Plain edge continuous Not known Wall+ base 819.2 -156 -19.7 —41 Dairy 

863 TAK26 iddle Pastoral Ridged APS return 16 Rim + wal 75.0 =-22.3 —26.2 —40 Dairy 

903 TAK27 Early Pastoral Cord impression 20 Rim + wal 308.5 —22.8 -—21.7. 1.1 Non-ruminant adipose 
2028 TAK2028 iddle Pastoral APS, paired lines of dashes Not known Wall + base 931.0 -24.5 -28.9 -—44 Dairy 
2251 TAK28 iddle Pastoral APS (row of impressed dots) 14 Rim 96.9 —21.5 —24.0 —2.5 Ruminant adipose 
2523 TAK29 Late Pastoral Undecorated Not known Rim + wal 445.6 —18.5 -19.7 —1.2  Ruminant adipose 
2588 TAK30 Late Acacus UNCL (impressed dots) 30 Rim + wal 823.3 -13.9 -13.8 0.1 Non-ruminant adipose 
2817 TAK32 Late Acacus Rocker packed Not known Rim 6,882.8 -—19.3 -17.5 1.8 Non-ruminant adipose 
2857 TAK35 iddle Pastoral APS return continuous, triangles Not known Rim + wal 238.3 -—20.1 —22.9 —2.8 Ruminant adipose 


APS, alternately pivoting stamp. 
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Figure 3 | Plots of 5'°C values and A’°C values of alkanoic acids in modern 
reference ruminant fats and archaeological animal fat residues in 
prehistoric Saharan pottery. a—c, Plots of the 83Cy6.9 and 8'Ci¢.9 values for 
archaeological animal fat residues in Late Acacus (hunter-gatherer) and Early 
Pastoral (Neolithic) pottery (a), archaeological animal fat residues in Middle 
and Late Pastoral Neolithic pottery (b), and modern reference animal fats 
collected from Libya and Kenya (c). d-f, Plots denote A'°C values for the 
archaeological fat residues (Late Acacus/Early Pastoral) (d) and Middle/Late 
Pastoral (e) and modern reference animal fats (f). Notably, the residues 
originating from the Late Acacus and Early Pastoral periods (d) do not contain 
dairy fats, and plot in the non-ruminant range, probably deriving from wild 


providing an explanation of how, in spite of lactose intolerance, milk 
products could be consumed by these people with the practice being 
adopted quickly; (3) they are consistent with the finding of the 
—13910*T allele, associated with the lactase persistence trait in 
Europeans, across some Central African groups such as the Fulbe from 
northern Cameroon”’, supporting arguments for some movement of 
people, together with their cattle, from the Near East into eastern 
Africa in the Early to Middle Holocene; and (4) they provide a context 
for understanding the origins and spread of other, independently aris- 
ing LP-associated gene variants in sub-Saharan Africa”. 


METHODS SUMMARY 


A total of 81 potsherds were sampled from the Takarkori rock shelter, Tadrart 
Acacus Mountains, Libyan Sahara, of which 56 were excavated from the Middle 
Pastoral period and the remainder originated from the Late Acacus (n = 8), and 
Early (1 = 14) and Late Pastoral (n = 3) periods (Supplementary Table 2). Of the 
81 potsherds analysed, 29 were selected for GC-C-IRMS analysis; of these, 18 
showed clear evidence of pure animal fat origin, with the remaining 11 comprising 
lipid profiles suggestive of the mixing of animal and plant fats. 

Lipid analysis and interpretations were performed using established protocols 
described in detail in earlier publications*”°. Briefly, ~2g of potsherd was 


3'°Cy6.0 (%o) 


3'°Cy6.0 (%o) 


fauna. e, The extensive processing of dairy products in pottery vessels from this 
region begins in the Middle Pastoral period approximately 5200-3800 Bc. The 
broad array of values suggests that the animals giving rise to these ruminant fats 
subsisted on an extensive range of different diets either composed completely of 
C; plants, varying amounts of C3 and Cy, plants or, for some of the 
archaeological samples, a diet comprising wholly C, plants. The ranges shown 
here represent the mean + 1 standard deviation of the A’°C values for a global 
database comprising modern reference ruminant animal fats from Africa, the 
UK (animals raised on a pure C3 diet)'®, Kazakhstan’*, Switzerland” and the 
Near East*®, published elsewhere. 


sampled and surfaces were cleaned with a modelling drill to remove any 
exogenous lipids. The potsherds were then ground to a powder, an internal 
standard was added, and solvent was extracted by ultrasonication (chloroform/ 
methanol 2:1 v/v, 2 X 10 ml). The solvent was evaporated under a gentle stream 
of nitrogen to obtain the total lipid extract (TLE). Aliquots of the TLE were 
trimethylsilylated  (N,O-bis(trimethylsilyl)trifluoroacetamide 80 ul, 70°C, 
60 min), and submitted to analysis by GC and GC-MS. Further aliquots of the 
TLE were treated with NaOH/H,O (9:1 w/v) in methanol (5% v/v, 70°C, 1h). 
Following neutralization, lipids were extracted into chloroform and the excess 
solvent was evaporated under a gentle stream of nitrogen. Fatty acid methyl 
esters (FAMEs) were prepared by reaction with BF3-methanol (14% w/v, 
70 °C, 1h). The FAMEs were extracted with chloroform and the solvent removed 
under nitrogen. The FAMEs were re-dissolved into hexane for analysis by GC and 
GC-C-IRMS. 

FAMEs of freeze-dried reference fats (typically using 5 mg of TLEs) were pre- 
pared exactly as above. 
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The clonal and mutational evolution spectrum of 
primary triple-negative breast cancers 
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Primary triple-negative breast cancers (TNBCs), a tumour type 
defined by lack of oestrogen receptor, progesterone receptor and 
ERBB2 gene amplification, represent approximately 16% of all 
breast cancers’. Here we show in 104 TNBC cases that at the time 
of diagnosis these cancers exhibit a wide and continuous spectrum of 
genomic evolution, with some having only a handful of coding 
somatic aberrations in a few pathways, whereas others contain 
hundreds of coding somatic mutations. High-throughput RNA 
sequencing (RNA-seq) revealed that only approximately 36% of 
mutations are expressed. Using deep re-sequencing measurements 
of allelic abundance for 2,414 somatic mutations, we determine for 
the first time—to our knowledge—in an epithelial tumour subtype, 
the relative abundance of clonal frequencies among cases represent- 
ative of the population. We show that TNBCs vary widely in their 
clonal frequencies at the time of diagnosis, with the basal subtype of 
TNBC’* showing more variation than non-basal TNBC. Although 
p53 (also known as TP53), PIK3CA and PTEN somatic mutations 
seem to be clonally dominant compared to other genes, in some 
tumours their clonal frequencies are incompatible with founder 
status. Mutations in cytoskeletal, cell shape and motility proteins 
occurred at lower clonal frequencies, suggesting that they occurred 
later during tumour progression. Taken together, our results show 
that understanding the biology and therapeutic responses of patients 
with TNBC will require the determination of individual tumour 
clonal genotypes. 

To understand the patterns of somatic mutation in TNBC, we 
enumerated genome aberrations at all scales from 104 cases of primary 
TNBC (Affymetrix SNP6.0, 104 cases; RNA-seq, 80 cases; genome/ 
exome sequencing, 65 cases: 54 exomes, 15 genomes with 4 overlap- 
ping) (Supplementary Table 1 and Supplementary Fig. 1), annotated 
with clinical information (Supplementary Table 2). We revalidated 
2,414 somatic single nucleotide variants*? (SNVs) (Supplemen- 
tary Table 3) with targeted deep sequencing to a median of 20,000 


coverage, including 43 non-coding splice site dinucleotide mutations 
(Supplementary Table 4) and 104 genes with 107 indels (Supplemen- 
tary Table 5 and Supplementary Methods). Notably, the distribution of 
somatic mutation abundance varies in a continuous distribution 
among tumours (Fig. 1a) and seems to be unrelated to the proportion 
of the genome altered by copy number alterations (CNAs) (Fig. 1b) or 
tumour cellularity (Supplementary Fig. 2b). Although this distribution 
could be partially explained by a false-negative rate in mutation dis- 
covery, others have noted similar distributions in epithelial cancers’, 
suggesting that the total mutation content of individual tumours may 
be shaped by biological processes or differential exposure to mutagenic 
influences in the population. 

The overall pattern (Supplementary Fig. 3a, b) of CNA abundance 
appears similar (Supplementary Fig. 4) to that seen in a larger, inde- 
pendent series of ~2,000 SNP6.0 profiled breast tumours’. Among the 
most frequently observed CNA events (Supplementary Table 6) are the 
tumour suppressor and oncogenes PARK2 (6%), RB1 (5%), PTEN 
(3%) and EGFR (5%). Here we report intragenic deletions (Sup- 
plementary Fig. 5) in the PARK2 tumour suppressor*”, specifically 
linking PARK2 with TNBC for the first time. Consistent with previous 
reports in breast cancer’’, we did not observe frequent recurrent struc- 
tural rearrangements (Supplementary Fig. 3d and Supplementary 
Table 7), although we revalidated many individual fusion events invol- 
ving known oncogenes or tumour suppressors (for example, KRAS, 
RB1, IDH1, ETV6) (Supplementary Tables 8-10). 

A comparison of RNA-seq data with genomes/exomes data revealed 
that only 36% of validated somatic SNVs were observed in the transcrip- 
tome sequence (Supplementary Table 3 and Supplementary Fig. 2b). Ina 
recent lymphoma study, similar proportions were observed (137 of 329 
somatic mutations expressed in RNA-seq)"’. As expected, the propor- 
tion of low-abundance somatic SNVs observed in RNA is reflected in 
the distribution of wild-type, heterozygous and homozygous expressed 
mutations (Supplementary Fig. 2b), consistent with the notion that 
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low-abundance alleles may represent rarer clones in the primary 
tumour. We found 43 splice junction mutations with evidence for an 
impact on splicing patterns (Supplementary Table 4), encompassing 
several known tumour suppressors (p53, PIK3R1; Supplementary Fig. 
6) as well as many genes not yet implicated in carcinogenesis. Analysis of 
72 somatic mutations in the non-coding space of experimentally deter- 
mined human regulatory regions'* showed (Supplementary Table 11) a 
significant overrepresentation (31.9% versus expected 2.5%, Fisher exact 
test P=2%X10'’) of mutations within retinoblastoma-associated 
protein (RB)-binding sites. Six mutations were predicted to be damaging 
to RB binding (Supplementary Methods and Supplementary Fig. 7). 
This is consistent with observations of frequent functional disruption 
of the RB-regulated cell cycle network’ in TNBC. 

We next searched for mutation enrichment patterns in three ways: 
by single gene mutation frequency over multiple cases; by the mutation 
frequency over multiple members of a gene family; and by correlating 
mutation status with expression networks. First, similar to other 
studies'*'*, p53 is the most frequently mutated gene (Supplementary 
Table 12) with 62% of basal TNBC (determined by gene expression 
classification with PAM50 (ref. 16) analysis on RNA-seq expression 
profiles) and 43% of non-basal TNBC cases harbouring a validated 
somatic mutation. We also observed frequent mutations in PIK3CA at 
10.2% (7/65), USH2A (Usher syndrome gene, implicated in actin 
cytoskeletal functions) at 9.2% (6/65), MYO3A at 9.2%, PTEN and 
RB1 at 7.7% (5/65) and a further eight genes (including ATR, UBR5 
(also known as EDD1), COL6A3) at 6.2% (4/65) of cases in the cohort 
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Figure 1 | Distribution of number of validated 


yas somatic mutations by case over 65 cases. 
53 ; : 
2 prsoa a, Mutation frequency (basal, red; other, grey). 


Patients harbouring known driver gene mutations 
are indicated. b, Case-specific and overall (inset) 
distributions of mutations in CNA classes. AMP, 
amplification; GAIN, single copy gain; HETD, 
hemizygous deletion; HLAMP, high-level 
amplification, HOMD, homozygous deletion; 
NEUT, no copy number change. The number of 
(HOMD, HLAMP) CNAs (black diamonds) and 
percentage genome altered (green circles) are 
indicated. 
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(Fig. 2a). Considering background mutation rates’’, p53, PIK3CA, 
RB1, PTEN, MYO3A and GH1 showed evidence of single gene selec- 
tion (q < 0.1) (Supplementary Table 13). Additional recurrent muta- 
tions of note occurred in the synuclein genes (SYNEI and SYNE2, 9.2% 
6/65, recently implicated in squamous head and neck cancers'*”’), 
BRCA2 (three cases), and several other well known oncogenes 
(BRAF, NRAS, ERBB2 and ERBB3) with mutations in two cases each. 
Approximately 20% of cases contained examples of potentially 
‘clinically actionable’ somatic aberrations, including BRAF V600E, 
high-level EGFR amplifications and ERBB2 and ERBB3 mutations. 
In the second approach we searched for statistically overrepresented 
gene families and protein functions using the Reactome functional 
protein interaction database” (Supplementary Methods). This ana- 
lysis quantifies gene family involvement through sparse mutation 
patterns in functionally connected genes, which would be statistically 
underrepresented by single gene recurrent mutation analysis. The 
overrepresented pathways (false discovery rate (FDR) < 0.001) 
included p53-related pathways along with chromatin remodelling, 
PIK3 signalling, ERBB signalling, integrin signalling and focal adhesion, 
WNT/cadherin signalling, growth hormone and nuclear receptor co- 
activators, and ATM/RB-related pathways (Fig. 3a and Supplementary 
Table 14). We note that the candidate ‘driver’ MYO3<A, a cytoskeleton 
motor protein involved in cell shape and motility, relates to several 
pathways upstream and downstream of integrin signalling. The 
mutated genes include extracellular matrix (ECM) interactions 
(laminins, collagens), ECM receptors (integrins), several proteins 
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cases in green, and ER in blue), 
shown as a percentage of cases (in 
parentheses) with one or more 
mutations. *P < 0.05. 
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regulating actin cytoskeleton dynamics (usherin, palladin, multiple 
myosins) and microtubule motor proteins (kinesins) (Fig. 2a). All of 
these contribute to cellular processes that have been functionally impli- 
cated in cancer progression; however, a signature of somatic mutation 
associated with these proteins has not been previously noted in TNBC. 
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Figure 3 | Network analysis of 254 recurrently mutated genes by somatic 
point mutations and indels. a, Case-specific mutations shaded according to 
clonal frequencies in known driver genes, plus genes from integrin signalling 
and ECM-related proteins (laminins, collagens, integrins, myosins and 
dyneins). b, Significantly overrepresented pathways (FDR < 0.001) from 
recurrently mutated genes (see Supplementary Methods). Node shading 
encodes the adjusted P value (q value) of the comparison of the distribution of 
clonal frequencies of mutations in a given pathway to the overall distribution of 
clonal frequencies. A spectrum of higher (red) and lower (yellow) clonal 
frequencies is evident. Letters in parentheses indicate database sources. 
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To confirm the mutational spectrum in the general breast cancer popu- 
lation we re-sequenced all exons of 29 genes in an additional 159 breast 
cancers (82 oestrogen receptor (ER)* and 77 ER, tumour and 
matched normal) (Fig. 2b), and confirmed that many of the genes 
found in the discovery cohort were recurrently mutated in an addi- 
tional population. Whether this pattern of mutation represents the 
occurrence of disease-modifying mutations, or possibly selection from 
other processes (for example, transcription-related hypermutation) is 
unknown. Interestingly, the enrichment of cytoskeletal functions in the 
somatic aberration landscape is also evident from the copy number and 
alternative splicing landscapes (Supplementary Fig. 8). 

Third, we integrated both the CNA and mutation data with expres- 
sion data to reveal genomic events associated with extreme changes in 
the transcription of interacting genes” (Table 1), using a bipartite 
graph-based method (driverNet; Supplementary Methods). The 
somatic aberrations showing statistically significant association with 
extreme expression in this analysis (P<0.05) (Table 1 and Sup- 
plementary Table 15) implicate well known oncogenes and tumour 
suppressors (TP53, PIK3CA, NRAS, EGFR, RB1, ATM) and suggest 
several new genes of interest, including PRPS2 (a nucleotide bio- 
synthesis enzyme, rank 7), harbouring homozygous deletions in three 
cases, NRC31 (a glucocorticoid receptor, rank 10) with SNVs in three 
cases, four PKC-related genes, PRKCZ, PRKCQ, PRKGI and PRKCE. 
The gene networks show a partial overlap with driverNet applied to the 
TCGA ovarian high-grade serous data*’ (Supplementary Table 16). 

Having identified candidate driver genes and significantly over- 
represented pathways, we asked how these are distributed among 
individual tumours by clustering a pathway-patient-mutation matrix 
(Supplementary Fig. 9). The abundance of implicated pathways can be 


Table 1 | Analysis of the top somatically aberrated genes influencing 
expression 


Rank Gene gband SNVor HLAMP  HOMD Events P value 
indel 

1 TP53 7p13.1 35 0) ) 2242 0 

2 PIK3CA 3q26.32 7 0) ) 441 1 <0" * 

3 NRAS p13.2 2 0) 0 271 4x10+% 

4 EGFR 7p11.2 1 5 ) 220 4x10+% 

5 RB1 3q14.2 5 0) 5 184 5x 1074 

6 PGM2 4p14 1 0) 1 172 5x107+ 

7 PRPS2 23p22.2 0 0) 3 171 5x10+ 

8 PTEN 0q23.31 5 0) 3 150 5x107+% 

9 PRKCE 2p21 0 0) 1 136 7x10°+* 

10 NR3C1 5q31.3 3 0) ¢) 130 7 X10°* 

11 CREBBP 6p13.3 1 0) 1 119 8x107% 

12 cS 2q13.2 1 0) 0 108 0.0011 

13 MAN2A2 5q26.1 2 0) 1 104 0.0012 

14 HMGCS2 p12 1 2 0 100 0.0013 

15 HEXA 5q24.1 2 1 ) 97 0.0013 

16 ADCY9 6p13.3 2 1 ) 91 0.0017 

17 OR4N4 5q11.2 0 0) 5 90 0.0017 

18 LCLAT1 2p23.1 ) 0) 1 85 0.002 

19 DGKkI 7q33 2 0) 0 82 0.0022 

20 CYP2A6 9q13.2 (0) 0 80 0.0024 

21 JAK1 p31.3 0) 0 78 0.0026 

22 POLRIA 2p11.2 2 0) ) 78 0.0026 

23 PLD1 3q26.31 0) ¢) 69 0.0038 

24 IDH3B 20p13 0) 1 68 0.004 

25 PAPSS2 0q23.2 ) 0) 3 67 0.0041 

26 PRKX 23p22.33 0 0 2 65 0.0046 

27 TPH2 2q21.1 0) ) 65 0.0046 

28 UGT2B17 4q13.2 0 0) 1 63 0.0053 

29 RRM2 2p25.1 0 ¢) 57 0.0072 

30 ATM 1q22.3 0) ) 55 0.0084 

31 CLCA1 p22.3 2 0) 0 54 0.009 

32 PRKCZ p36.33 0) ) 53 0.0095 

Rank, derived by the driverNet algorithm (see Supplementary Methods); gene, somatically aberrated 

gene; gband, chromosomal band containing gene; SNV or indel, the number of cases harbouring an 

SNV or indel in the gene; HLAMP, the number of cases harbouring a predicted high-level amplification; 

HOMD, the number of cases harbouring a predicted homozygous deletion; events, number of gene 

expression outliers (see Supplementary Methods) coincident with a genomic aberration and where the 

outlying gene is connected to the aberrated gene; P value, statistical significance based on a randomly 


generated background distribution (Supplementary Methods). 
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seen to be only partially related to the total number of mutations in a 
case, groups 1 and 2 having on average fewer mutations per case. The 
frequent involvement of pathways with p53, PTEN and PIK3CA as 
members, is noted (Supplementary Fig. 9); however, the case group- 
ings also vary by the progressive inclusion of additional pathways (for 
example, WNT signalling, integrin signalling, ERBB signalling, hypoxia 
and PI3K). More than two thirds of cases contained one or more muta- 
tions in the actin/cytoskeletal functions group of genes (Supplementary 
Fig. 9). Some 12% of cases did not contain somatic aberrations in any of 
the frequent drivers or cytoskeletal genes (Supplementary Table 12). 
This suggests that primary TNBCs are mutationally heterogenous from 
the outset, with some patients’ tumours having a small number of 
implicated pathways and few mutations, whereas other patients present 
with tumours containing extensive mutation burdens and multiple 
pathway involvement. 

Motivated by the observation that early primary TNBCs show a 
wide variation of mutation content, we asked whether the clonal 
composition of these primary cancers is similarly varied. We and 
others have shown””’ how deep-frequency measurements of allelic 
abundance can be used to study tumour clonal evolution. Clonal 
mutation frequency, a compound measure of clonal complexity, 
(Fig. 4a) can be estimated from allele abundance, once the influence 
of copy number states, regional loss of heterozygosity (LOH state) and 
tumour cellularity have been considered (although we note that 
approximately 68% of SNVs in this study are in diploid, neutral 
regions). To extend allelic abundance measurements to estimation of 
clonal frequencies, we implemented a Dirichlet process clustering 
model (pyclone; Supplementary Methods and Supplementary Fig. 10) 
that simultaneously estimates the genotype and clonal frequency given 
a list of deeply sequenced mutations and their local copy number and 
heterozygosity contexts. 

Using the set of deeply sequenced (median 20,000), validated 
SNVs, our analysis revealed (Fig. 4b) that groups of mutations within 
individual cases have different clonal frequencies, indicative of distinct 
clonal genotypes. Remarkably, the tumours exhibit a wide spectrum of 
modes over clonal frequencies (Fig. 4b and Supplementary Fig. 11), 
with some cases showing only one or two frequency modes (Fig. 4b), 
indicating a smaller number of clonal genotypes, whereas other 
tumours exhibit multiple clonal frequency modes, indicating more 
extensive clonal evolution. Consistent with early “driver gene’ status, 
mutations in known tumour suppressors such as p53 tend to occur in 
the highest clonal frequency group in most tumours. However, in some 
cases (for example, SA219, SA236; Fig. 4b, Supplementary Fig. 11) p53 
resides in lower-abundance clonal frequency groups (Supplementary 
Fig. 12 and Fig. 3a), suggesting that it was not the founding event. 
Although the number of clonal frequency modes tends to increase with 
the number of mutations, the relationship is not strictly linear (Fig. 4c). 
To determine whether basal and non-basal cancers differ in their 
clonality, we compared the distribution of clonal modes (clusters) by 
case and as an overall distribution, and note that basal TNBCs have 
more clonal frequency modes than non-basal TNBCs (Fig. 4c). Both of 
these distributions emphasize a key observation; namely, that at the 
time of diagnosis TNBCs already display a widely varying clonal evolu- 
tion that mirrors the variation in mutational evolution. 

Finally, we asked where key pathways appear in the distribution of 
clonal frequency groups. We examined the clonal frequency of genes 
in each pathway and ascertained if there was a deviation away from the 
distribution of clonal frequency for all mutations. As expected, 
pathways involving p53 and PIK3CA showed significantly skewed 
distributions (Wilcoxon, q < 0.01; Fig. 3b and Supplementary Fig. 12) 
towards higher clonal frequencies, consistent with their roles in early 
tumorigenesis (Fig. 3a and Supplementary Table 17). Intriguingly, 
pathways with cytoskeletal genes such as myosins, laminins, collagens 
and integrins tend to have lower median clonal frequencies, suggesting 
that somatic mutations in these genes are acquired much later (Fig. 3b). 
Notably, the median clonal frequency for Reactome pathway ‘p53 
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Figure 4 | Clonal evolution in TNBC. a, Schematic representation of 
integration of CNA, LOH, allelic abundance measurements and normal cell 
contamination for clonal frequency estimation using a Dirichlet process (DP) 
model (left). Example of a mixture of three clonal genotypes composed of four 
mutations (A, B, C, D) and their resulting clonal frequencies. b, Estimated 
clonal frequencies for four cases are shown as the distribution of posterior 
probabilities from the pyclone model (Supplementary Methods). Clonal 
frequency distributions are coloured by their frequency group membership. 
c, Left, relationship of mutation abundance (synonymous (Syn) and non- 
synonymous (Non-syn)) and the inferred number of clonal clusters. Middle, 
distribution and kernel density (red line) of the number of inferred clonal 
clusters over 54 TNBCs. Right, kernel density distribution of clonal clusters for 
basal (red) and non-basal (grey) tumours. 


pathway feedback loops’, including 46 mutations in ATM, ATR, 
NRAS, PIK3CA, PTEN, SIAH1 and p53,was 73% (Wilcoxon, 
q = 0.0007), whereas ‘integrin cell surface interactions’, including 23 
mutations in integrin, laminin and collagen genes, had a median clonal 
frequency of 42% (Wilcoxon, q = 0.9569). 
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Primary TNBCs are still treated as if they were a single disease entity, 
yet it is clear they do not behave as a single entity in response to current 
therapies. Here we show for the first time, using next-generation 
sequencing mutational profiling methods, that treatment-naive 
TNBCs display a complete spectrum of mutational and clonal evolu- 
tion, with some patients’ tumours showing only a few somatic coding 
sequence point mutations with a limited number of molecular pathways 
implicated, whereas other patients’ tumours exhibit considerable addi- 
tional mutational involvement. Moreover, the clonal heterogeneity of 
these cancers is also a continuum, with some patients presenting with 
low-clonality cancers and other cases exhibiting more extensive clonal 
evolution at diagnosis. In this respect, the basal expression subtype of 
TNBCs also tends to show higher clonality at diagnosis, although the 
relationship is not exact. 

In clonally evolving tumours, identification of genes by single gene 
mutation frequency measurements will probably favour early driver 
genes, because the subsequent involvement of multiple additional 
pathways during tumour progression is unlikely to be observed as a 
frequent single gene mutation. The clonality analysis emphasizes this 
point: known drivers such as p53, PIK3CA and PTEN have among the 
highest clonal frequencies, whereas mutations in cell shape/motility 
and ECM-signalling genes appear in the lower clonal frequency 
groups, distributed over many genes. Although p53 somatic mutations 
are clearly early events, the clonal frequencies observed in some TNBC 
suggest that they are not always the first event, raising a question about 
what drives early clonal expansion in some of these cancers. Our 
findings suggest that each TNBC at the time of primary diagnosis 
may be ata very different phase of molecular progression, with possible 
implications for approaches to the biology of low clonality versus high 
clonality primary tumours. 


METHODS SUMMARY 


The genomes and transcriptomes of 104 TNBCs were profiled with Affymetrix 
SNP6.0 arrays (all cases), RNA-seq (80 cases; Illumina GAIT), and whole exome/ 
genome sequencing (65 cases; tumour and normal DNA). Exomes were obtained 
using Agilent’s Human All Exon SureSelect Target Enrichment System v.1 fol- 
lowed by Illumina GAII sequencing, and whole genomes were sequenced using 
Life Technologies SOLiD system. Data were analysed using computational 
approaches to detect somatic SNVs*”, indels, copy number alterations, gene 
fusions and gene expression patterns. Predictions were then validated using 
orthogonal experimental assays, including targeted ultra-deep amplicon sequencing 
of SNVs to ~20,000 redundancy. We determined single genes under selection 
using a statistical approach that considers patient-specific background mutation and 
transition/transversion rates. Mutations predicted to alter transcriptional profiles 
were determined using an integrated bipartite graph-based method (driverNet) that 
associates genomic aberrations with outlying expression patterns informed by pre- 
defined pathway gene sets. Disrupted pathways were determined using the Reactome 
FI Cytoscape plugin. Clonal analysis was performed (cases with >10 mutations) 
using a Dirichlet process statistical model that simultaneously estimates clonal fre- 
quencies and mutation genotype given deeply sequenced somatic SNVs and copy 
number estimates. Experimental assays and analytical methodology are detailed in 
the Supplementary Information. 
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All cancers carry somatic mutations in their genomes. A subset, 
known as driver mutations, confer clonal selective advantage on 
cancer cells and are causally implicated in oncogenesis’, and the 
remainder are passenger mutations. The driver mutations and 
mutational processes operative in breast cancer have not yet been 
comprehensively explored. Here we examine the genomes of 100 
tumours for somatic copy number changes and mutations in the 
coding exons of protein-coding genes. The number of somatic muta- 
tions varied markedly between individual tumours. We found strong 
correlations between mutation number, age at which cancer was 
diagnosed and cancer histological grade, and observed multiple 
mutational signatures, including one present in about ten per cent 
of tumours characterized by numerous mutations of cytosine at TpC 
dinucleotides. Driver mutations were identified in several new cancer 
genes including AKT2, ARIDIB, CASP8, CDKNI1B, MAP3K1, 
MAP3K13, NCORI, SMARCDI and TBX3. Among the 100 
tumours, we found driver mutations in at least 40 cancer genes 
and 73 different combinations of mutated cancer genes. The results 
highlight the substantial genetic diversity underlying this common 
disease. 

The coding exons of 21,416 protein coding genes and 1,664 
microRNAs were sequenced and copy number changes examined in 
100 primary breast cancers, 79 of which were oestrogen receptor 
positive (ER+) and 21 of which were oestrogen receptor negative 
(ER—) (Supplementary Table 1). We sequenced normal DNAs from 


the same individuals to exclude inherited sequence variation. We 
identified 7,241 somatic point mutations: 6,964 were single-base sub- 
stitutions, of which 4,737 were predicted to generate missense; 422, 
nonsense; 158, an essential splice site; 8, stop codon read-through; and 
1,637, silent changes in protein sequence. Two substitutions were 
found in microRNAs. There were 277 small insertions or deletions 
(71 and 206, respectively), of which 231 introduced translational 
frameshifts and 46 were in-frame (Supplementary Table 2). Analyses 
of copy number yielded 1,712 homozygous deletions and 1,751 regions 
of increased copy number (amplification) (Supplementary Table 3). 

Somatic driver substitutions and small insertions/deletions (indels) 
were identified in cancer genes previously implicated in breast cancer 
development, including AKT1, BRCAI, CDH1, GATA3, PIK3CA, 
PTEN, RB1 and TP53 (Supplementary Table 4; see also http://www. 
sanger.ac.uk/genetics/CGP/Census). Likely drivers were also found in 
cancer genes involved in other cancer types, including APC, ARIDIA, 
ARID2, ASXL1, BAP1, KRAS, MAP2K4, MLL2, MLL3, NF1, SETD2, 
SF3B1, SMAD4 and STK11. 

To identify new cancer genes, we searched for non-random cluster- 
ing of somatic mutations in each of the 21,416 protein-coding genes”* 
and sequenced a subset of genes highlighted by this analysis in a follow- 
up series of 250 breast cancers (Supplementary Tables 5 and 6). 
Persuasive evidence was found for nine new cancer genes (Fig. la 
and Supplementary Fig. 1). Of these ARIDIB, CASP8, MAP3K1, 
MAP3K13, NCOR1, SMARCDI1 and CDKNI1B had the truncating 
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mutations and often biallelic inactivation characteristic of inactivated, 
potentially recessive cancer genes (Supplementary Table 4). AKT2 is 
probably an activated, dominantly acting cancer gene. The effects of 
TBX3 mutations on its function are unclear. 

MAP3K1 encodes a serine/threonine protein kinase that regulates 
the activity of the ERK MAP kinase (the extracellular signal-regulated 
mitogen-activated protein kinase), JUN kinase and p38 signalling 
pathways implicated in control of cell proliferation and death*. 
Somatic mutations in MAP3K1 were observed in 6% of breast cancers, 
predominantly in ER+ cases. Most were protein truncating. MAP3K1 
phosphorylates and activates the protein encoded by MAP2K4, a 
known recessive cancer gene with inactivating mutations in breast 
and other cancers’. In turn, MAP2K4 phosphorylates and activates 
the JUN kinases MAPK8 (also known as JNK1) and MAPK9 (also 
known as JNK2), which phosphorylate JUN, TP53 and other tran- 
scription factors mediating cellular responses to stress*. Truncating 
mutations and other non-synonymous mutations were also found in 
MAP3K13, which encodes a kinase that phosphorylates and activates 
MAP2K7. MAP2K7 phosphorylates and activates MAPK8 and 
MAPK9 (ref. 4). Thus, in breast cancer, inactivating mutations in 
MAP3K1, MAP2K4 and MAP3K1]13 are predicted to abrogate signalling 
pathways that activate JUN kinases (Fig. 1b). 

In the serine/threonine kinase gene AKT2, we identified a single 
somatic missense mutation, Glu 17 Lys, that is identical to the recurrent, 
activating mutation in AKT1 previously reported in breast cancer’. 
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Figure 1 | Newcancer genes established in this study and involvement of the 
JUN kinase signalling pathway. a, Representations of the protein-coding 
sequences and major domains in cancer genes established in this study. Somatic 
mutations are shown as circles: truncating (red), essential splice site (blue), 
missense (green) and in-frame indel (yellow). The red lines indicate the 
positions of large homozygous deletions. aa, amino acids. b, Pathways 
regulating the JUN kinases MAP2K7 and MAP2K%8, indicating genes with 
mutations in this series. Genes in green are activated by mutations, whereas 
genes in red are inactivated. 
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Thus, AKT2 is also probably a cancer gene, albeit one infrequently 
implicated in breast cancer development. Because AKT phosphory- 
lates and inhibits MAP2K4 (ref. 7) and mutations in PIK3CA and 
PTEN can result in AKT activation’, about half of breast cancers 
may have abrogation of JUN kinase signalling (Fig. 1b). The biological 
consequences of the reduction in JUN kinase activity are likely to be 
diverse and complex, but may include destabilization and consequent 
inactivation of TP53 with disruption of pro-apoptotic cellular signal- 
ling in response to stress’. 

We observed truncating mutations and homozygous deletions of 
NCORI. In addition to mediating repression of thyroid-hormone and 
retinoic-acid receptors by promoting chromatin condensation and pre- 
venting access of the transcription machinery’®, NCORI participates in 
ligand-dependent transcriptional repression by oestrogen receptor 
alpha". We also identified inactivating mutations in SMARCD1 and 
ARIDIB, further implicating aberrant chromatin regulation. The 
encoded proteins of both are components of the SWI/SNF chromatin 
modelling complex, which incorporates the products of several estab- 
lished recessive cancer genes, including PBRM1, ARIDIA, SMARCB1 
and SMARCA4 (refs 3, 12-14). 

We found three truncating mutations and a missense mutation 
in CDKN1B. Two truncating mutations in CDKN1B in cancer have 
previously been reported'*’’, and collectively the results confirm that 
CDKN1B is a cancer gene. CDKN1B (also known as p27 or KIP1) 
normally inhibits activation of cyclin E/CDK2 and cyclin D/CDK4 
complexes, thus preventing cell cycle progression at phase G1”. 

Three truncating mutations were observed in CASP8. CASP8 is a 
member of the cysteine/aspartic acid protease family that forms a 
complex with the FAS cell surface receptor to promote programmed 
cell death. Inactivation of CASP8 in these cancers is therefore pre- 
dicted to abrogate apoptosis in response to a variety of signals. 

Six tumours had mutations in TBX3, which encodes a T-box tran- 
scription factor that regulates stem cell pluripotency-associated and 
reprogramming factors and is involved in normal breast develop- 
ment'*!°. Constitutional inactivating mutations in TBX3 cause ulnar- 
mammary syndrome, in which there is failure of breast and apocrine 
development coupled with abnormalities of limb morphogenesis”. 
Three breast cancers had in-frame deletions, one of Thr 210 and the 
other two of Asn 212, a residue through which the T-box domain binds 
to DNA. Despite the presence of truncating mutations in three further 
cases, the recurrent and clustered in-frame deletions and the finding 
that all mutations were heterozygous suggests that they may not simply 
result in loss of function. Indeed, recent reports suggest that increased 
activity of TBX3 is likely to contribute to oncogenesis. The proportion 
of stem-like cells in breast cancers is increased by oestrogen-dependent 
activation of the TBX3 pathway’’. Moreover, TBX3 overexpression 
increases the efficiency of the derivation of induced pluripotent stem 
cells'® and the ability of cancer cells to form tumours”. 

Further supporting their role in oncogenesis, three of the nine newly 
identified somatically mutated cancer genes, MAP3K1, CASP8 and 
TBX3, carry inherited common variants, identified by genome-wide 
association studies, that confer small increased risks of breast cancer?*”’. 
Several additional genes showed truncating mutations and are bio- 
logically plausible candidate cancer genes contributing infrequently 
to breast cancer development. Some, including ASXL2, ARID5B, 
KDM3aA, SETDIA, CHD1, NCOR2, HDAC9 and CTCF, encode proteins 
that regulate chromatin structure, whereas others, including FANCA 
and ATR, are involved in DNA repair. 

Cancers arise through successive waves of clonal expansion depend- 
ent on the sequential acquisition of driver mutations. A central para- 
meter of cancer development is therefore the number of driver 
mutations required for conversion of a normal cell into a symptomatic 
cancer. Estimates based on cancer age—incidence curves have indicated 
that approximately five rate-limiting steps underlie the development of 
common adult solid tumours”. Experimental studies have similarly 
indicated that a limited number of key genetic changes are required for 
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Figure 2 | The landscape of driver mutations in breast cancer. Each of the 40 
cancer genes in which a driver mutation or copy number change has been 
identified is listed down the left-hand side. The number of mutations in each 


neoplastic transformation of human cells”. Our systematic genome 
analysis now provides a direct survey of the landscape of driver muta- 
tions in breast cancer. 

Somatic driver point mutations and/or copy number changes in at 
least 40 cancer genes were implicated in the development of the 100 
breast cancers (Fig. 2, Supplementary Tables 3 and 4, and Supplemen- 
tary Methods). The maximum number of mutated cancer genes in an 
individual cancer was 6, but 28 cases only showed a single driver. Thus, 
there seems to be substantial variation in the number of drivers. In 
some cases, the presence of multiple drivers was associated with 
subclonal evolution of the cancer (Supplementary Statistical 
Analyses). However, in others multiple drivers were in the root cancer 
clone. Seven of the 40 cancer genes (TP53, PIK3CA, ERBB2, MYC, 
FGFR1/ZNF703, GATA3 and CCND1) were mutated in more than 
10% of cases. Collectively these contributed 58% of driver mutations 
(144 of 250). Therefore, 33 mutated cancer genes, each contributing 
relatively infrequently, were responsible for the remaining 42% of 
driving genetic events. We observed 73 different combinations of 
mutated cancer genes. Thus, most breast cancers differed from all 
others (Fig. 2 and Supplementary Fig. 2). This assessment of the 
genetic diversity of breast cancer is probably conservative because, 
for several reasons, it underestimates the number of mutated cancer 
genes in each case. 

At present, we know little about the mutational processes responsible 
for the generation of somatic mutations in breast and other cancers. In 
the 100 breast cancers analysed here, there was substantial variation in 
the total numbers of base substitutions and indels between individual 
cases (Fig. 3a). There was also considerable diversity of mutational 
pattern, ranging from cases in which CeG-> TeA transitions predomi- 
nated to cases in which all transitions and transversions made equal 
contributions (Fig. 3b and Supplementary Fig. 3). Taken together, the 
results suggest that multiple distinct mutational processes are operative. 
For most of these processes, the underlying mechanism is unknown. 

To illustrate one mutational signature in detail, we selected the ER+ 
breast cancer with the largest number of base substitutions in the 
series, PD4120 (Fig. 3a, asterisk; Fig. 4). The mutation spectrum of 
this case was distinctive, featuring CeG—> TeA, CeG—GeC and 
CeG— ArT mutations and very few mutations at AeT base pairs 
(Fig. 4a). To characterize this process further, we examined the 
sequence context in which the mutations occurred (in the following 
discussion, mutations at CeG base pairs are represented as the change 
at the C base) and found pronounced overrepresentation of thymine 
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gene in the 100 tumours is shown (rows), as is the number of driver mutations 
in each breast cancer (columns). Point mutations and copy number changes are 
coloured red and blue, respectively. 


immediately 5’ to the mutated cytosines. Thus, in PD4120 the large 
majority of mutations were of cytosine at TpC dinucleotides (Fig. 4b). 

To obtain further insight into the underlying mechanism in this 
case, we looked for differences in mutation prevalence between the 
transcribed and untranscribed strands of the 21,416 genes analysed 
(‘strand bias’) and found a higher prevalence of CT, C— G and 
C—A mutations on transcribed strands (P= 0.02) (Fig. 4c and 
Supplementary Table 7). This strand bias raises the possibility that 
transcription-coupled nucleotide excision repair (NER) has been 
operative. NER removes bulky DNA adducts that distort the DNA 
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Figure 3 | The variation in numbers and types of mutation between 
individual breast cancers. a, Numbers of small indels and base substitutions in 
the protein-coding exons of each of the 100 breast cancers studied. The cases are 
ranked according to the number of base substitutions. *Breast cancer PD4120 
(see main text). b, Mutation spectrum of four primary tumours with diverse 
mutational patterns. 
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double helix, notably pyrimidine dimers due to ultraviolet light expo- 
sure or adducts due to mutagens in tobacco smoke”®. There is a form of 
NER, recruited by RNA polymerase II, that is operative only on the 
transcribed strand of each gene and thus introduces a strand bias for 
mutations”. Therefore, one hypothesis to account for the strand bias 
in PD4120 is past involvement of NER, in turn implicating exposure to 
a bulky DNA-damaging agent, either of endogenous or exogenous 
origin. However, we cannot exclude the possibility that other DNA 
damage or repair processes generate a strand bias. At least eight addi- 
tional cancers in this series had a very similar mutational spectrum, 
sequence context and strand bias (Supplementary Fig. 4 and Sup- 
plementary Statistical Analysis). None had been treated before 
excision of the cancer. 

The somatic mutations in a cancer genome accumulate over a 
patient’s lifetime, during the lineage of mitotic divisions from the 
fertilized egg to the cancer cell. Some are acquired while cells in the 
lineage are biologically normal, whereas others are acquired after 
acquisition of the neoplastic phenotype. However, the relative propor- 
tions accumulated in these two phases are unknown. To explore this 
question, we examined the relationship between the total numbers of 
somatic base substitutions and the age at diagnosis in the 100 tumours 
(Fig. 5). In both ER+ and ER- cancers, no correlation was observed 
(P=0.33 and 0.14 respectively). If most somatic mutations in a 
cancer genome are acquired in normal tissues before neoplastic 
transformation, the later the onset of the cancer the longer this part 
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Figure 5 | The relationship between age at breast cancer diagnosis and all 
substitutions, and for C—> T substitutions at CpG sites. a, b, Data from the 
79 ER+ breast cancers. c, d, Data from the 21 ER— breast cancers. 
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Figure 4 | The mutational 
signature of ER+ breast cancer 
PD4120. a, The mutational 
spectrum. b, The sequence context of 
C—T,C—>GandC>A 
mutations. The central blue bar 
indicates the position of the mutated 
cytosine and the bases 5’ and 3’ are 
numbered on the horizontal axis. 

c, Strand bias of mutations showing 
substitutions at C bases and at T 
bases according to whether they are 
on the transcribed (T) or 
untranscribed (U) strands of the 
genes screened. 
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of the lineage is likely to have been and, consequently, the higher the 
number of mutations. The absence of a correlation therefore suggests 
that most mutations in breast cancer genomes occur after the initiating 
driver event. 

We then considered separately the subset of somatic mutations con- 
stituted by CeG— TeA substitutions at CpG dinucleotides, because 
this mutational pattern is observed in non-diseased tissues, manifesting 
prominently in normal germline variation. This subset showed a strong 
positive correlation with the age at cancer diagnosis in ER— cancers 
(P=1.2X 10’), supporting the proposition that it is enriched in 
mutations occurring in normal tissues and that, overall, other mutation 
classes occur later. By contrast, ER+ cancers showed no correlation 
between CeG—> TeA substitutions at CpG dinucleotides and age at 
diagnosis (P = 0.27). The basis for this pronounced difference is 
unclear, but potentially highlights a profound divergence in the 
dynamics of mutation acquisition between these two major subclasses 
of breast cancer. 

In clinical practice, breast cancers are graded microscopically on the 
basis of mitotic counts, pleomorphism of cancer cell nuclei and extent of 
tubule formation, which are then collected into an overall grade score. 
High scores indicate large numbers of mitoses, substantial tumour cell 
pleomorphism and little tubule formation, and are generally associated 
with more rapid progression. Significant correlations were not observed 
between numbers of driver mutations and grade scores (Supplementary 
Statistical Analysis). However, there were strong positive correlations 
between the total number of substitutions (that is, drivers and 
passengers) and mitosis and tubule scores (P= 0.0002 and 0.002 
respectively), which remained significant after multiple testing correc- 
tions. The causal relationships between these features are unclear. 
However, because most substitutions are likely to be biologically inert 
passengers, it is possible that the biological state of high-grade breast 
cancers may be responsible for generating increased numbers of muta- 
tions, rather than the converse. 

The panorama of mutated cancer genes and mutational processes in 
breast cancer is becoming clearer, and a sobering perspective on the 
complexity and diversity of the disease is emerging. Driver mutations 
are operative in many cancer genes. A few are commonly mutated, but 
many infrequently mutated genes collectively make a substantial con- 
tribution in myriad different combinations. Multiple somatic muta- 
tional processes have been operative. Ultimately, characterization of 
the genomes of breast cancer, and others, will provide a robust and 
biologically meaningful classification generating insights into the 
clinical heterogeneity of the disease and influencing strategies to find 
new modes of prevention and treatment. 
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METHODS SUMMARY 


DNAs from breast cancers and normal tissues from the same individuals were 
subjected to in-solution enrichment for the coding exons of 21,416 genes and 1,664 
micro RNAs (Agilent 50 Mb Exome) and subsequently sequenced on Illumina 
GAIIX machines. The average exome coverage (at a minimum depth of X30) was 
70%. Following alignment to the reference genome using BWA”, somatic sub- 
stitutions and small indels were identified using CaVEMan and Pindel calling 
algorithms, respectively*””. DNAs from cancers were also hybridized to 
Affymetrix SNP6 arrays and analysed using the ASCAT algorithm*® for copy 
number and zygosity changes. Confirmation by orthogonal sequencing technologies 
was attempted for all putative somatic mutations and follow-up of a subset of genes 
in additional case series was undertaken by targeted PCR and Illumina sequencing. 
Identification of genes showing evidence of selection was conducted as previously 
described’. 

Informed consent was obtained from all subjects and ethical approval obtained 
from Cambridgeshire 3 Research Ethics Committee (ref 09/H0306/36). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Patient samples. Informed consent was obtained from all subjects and ethical 
approval obtained from Cambridgeshire 3 Research Ethics Committee (ref 09/ 
H0306/36). Collection and use of patient samples were approved by the appro- 
priate IRB of each Institution. In addition, this study and usage of its collective 
materials had specific IRB approval. 

Exome enrichment and sequencing. Genomic libraries were prepared using the 
Illumina Paired End Sample Prep Kit following the manufacturer’s instructions. 
Enrichment was performed as described previously’, using the Agilent SureSelect 
Human All Exon 50Mb kit following the manufacturer’s recommended protocol 
but excluding pre-enrichment PCR amplification. Each exome was sequenced 
using the 75 or 76-bp paired-end protocol, on an Illumina GAII or HiSeq DNA 
Analyser, to produce approximately 10 Gb of sequence per exome. Sequencing 
reads were aligned to the human genome (NCBI build 37) using the BWA algo- 
rithm on default settings**. Reads which were unmapped, PCR-derived duplicates 
or outside the targeted region of the genome were excluded from the analysis. The 
remaining uniquely mapping reads (~60%) provided 60-80% coverage over the 
targeted exons at a minimum depth of X30. 

Sequencing of pooled PCR amplimers. Selected genes were targeted for follow- 
up investigations in 250 additional breast cancers by sequencing of pooled PCR 
products. An 8-bp index was introduced during amplification to enable sequence 
data from individual tumours to be identified in downstream analyses. 

For each amplimer, a primary PCR was performed using gene-specific primers 
modified with the inclusion of a common upstream adaptor sequence. A secondary 
PCR was performed using primers complementary to the common adaptor 
sequences. The reverse secondary primer contained the internal index, and 96 
different indexed primers were used to enable 96 different DNAs to be pooled 
before sequencing. The primary and secondary PCR amplifications were performed 
as a simultaneous multiplex reaction. Primer sequences are available on request. 

For each amplimer, PCR was performed in batches of 96 DNA samples. 

Following amplification, the 96 PCR products were pooled, purified using a 
QiaQuick column (Qiagen) and quantified on a Bioanalyser (Agilent). Pooled 
reactions from different amplimers (up to 50) were normalized for concentration 
and subsequently also pooled to produce the final template used for sequencing on 
a single lane of an Illumina GAII DNA Analyser (~5,000 amplimers per lane). 
Amplimers which failed PCR were excluded from the pooling experiments. The 
subsequent sequence reads were aligned with BWA and resulted in coverage 
typically exceeding 500 per individual sample amplimer. 
Variant detection. The CaVEMan (cancer variants through expectation maximiza- 
tion) algorithm was used to call single nucleotide substitutions’. This uses a naive 
Bayesian classifier to estimate the posterior probability of each possible genotype 
(wild type, germline, somatic mutation) at each base. We applied several post- 
processing filters to the set of initial C1VEMan mutation calls to remove variants 
reported in poor-quality sequence and increase the specificity of the output. 

To call insertions and deletions, we used split-read mapping implemented as a 
modification of the Pindel algorithm*’. This algorithm searches for reads where 
one end is anchored on the genome and the other end can be mapped with high 
confidence in two (split) portions, spanning a putative indel. Post-processing 
filters were applied to the output to improve specificity. 

Mutations were annotated to Ensembl version 58. 

Variant validation. Validation of all 7,241 putative somatic variants in the primary 
screen of 100 tumours and all variants found in the follow-up of 250 cases was 
attempted by either capillary resequencing or 454 pyrosequencing of PCR products 
spanning the mutation in the tumour and the normal pair. Where independent 
validation failed (approximately 20%) variants were reported to be somatic if 
manual inspection of the aligned sequence reads provided strong evidence to 
support their validity. 

Identification of likely driver base substitutions and indels. A subset of the 
7,241 substitution and indel somatic mutations identified in the exome screen were 
classified as ‘likely driver mutations’ using conservative criteria. To do this, we 
identified the established cancer genes from the Cancer Gene Census 
(http://www.sanger.ac.uk/genetics/CGP/Census/) that are known to be mutated 
by base substitutions and indels to contribute to cancer development. We then 
classified as likely driver mutations those that conformed to the known patterns 
of cancer-causing mutation for each cancer gene. Thus, for recessive cancer genes 
truncating mutations, essential splice site mutations and homozygous deletions 
were included. Missense mutations were also included where they had been seen 
previously or conformed to the known pattern of missense mutation in each 
gene (COSMIC database; http://www.sanger.ac.uk/genetics/CGP/cosmic/). For 
established, dominantly acting cancer genes, we included mutations that had been 
previously registered in COSMIC. For the new cancer genes established in this 
study, we applied essentially the same rules. However, for the recessive cancer genes, 
to be conservative we did not include missense variants (other than the single 
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variant in MAP3K1, which is almost certainly disruptive to the function of the 
protein). We included the variant in AKT2 because it is identical in nature to the 
recurrent variant in AKT1, and we included all TBX3 mutations. As indicated in the 
main text, we may have both underestimated and overcalled some somatic variants 
as drivers using this approach. However, the number of erroneous calls is likely to be 
small and overall we have probably underestimated the number of driver muta- 
tions. For the calling procedure for likely driver copy number variants, see below. 
Detection of copy number variation. Single nucleotide polymorphism (SNP) 
array hybridization on the SNP6.0 platform was done according to Affymetrix 
Protocols and as described at http://www.sanger.ac.uk/cgi-bin/genetics/CGP/ 
cghviewer/CghHome.cgi. 

Copy number analysis was performed using ASCAT (version 2.1) taking into 
account non-neoplastic cell infiltration and tumour aneuploidy”, and resulted in 
integral allele-specific copy number profiles for the tumour cells. Amplifications in 
the 100 samples analysed were called if copy number was =5 (for diploid tumours, 
with ASCAT ploidy <2.7 n) or =9 (for tumours with evidence of a whole-genome 
duplication, with ASCAT ploidy =2.7 n). Homozygous deletions were called if 
there were zero copies in the tumour cells. 

Identification of likely driver copy number variants. To identify likely driver 
copy number variants, we derived a conservatively generated list of frequently 
amplified regions in breast cancer from a previous study**. From the amplified 
regions in breast cancer obtained by GISTIC analysis of that study, those with a 
GISTIC Q-value of less than 10° were selected. Regions within 40 Mb of amp- 
lified regions with more significant Q-values were excluded, as many of these 
probably point to the same amplified target gene. This process generated seven 
focal, highly significantly amplified regions. These regions were annotated with 
their putative target genes where additional biological studies have indicated that 
they are the likely targets (ERBB2, CCND1, MYC, FGFR1/ZNF703, ZNF217, 
MDM2). Only the amplified region on chromosome 15 was not annotatable. 
Driver amplification of these seven focal regions in the 100 samples was called 
using the criteria above. Driver homozygous deletions were called if part or all of a 
homozygous deletion overlapped with a known recessive cancer gene from the 
Cancer Gene Census” or a newly discovered gene from this study. 
Estimation of the number of mutated copies. Allele-specific copy number esti- 
mates for point mutations and indels were obtained by integrating copy number 
and sequencing data. In a sample containing only tumour cells, the number of 
reads, r, with a mutation can be expressed as 
AmuR 
pease 


(1) 


In equation (1), mocus is the copy number of the locus, Myyt is the number of 
mutated copies and R is the total number of reads from that locus. In case of a 
tumour sample consisting of a fraction of tumour cells p, infiltrated with a fraction 
of normal cells 1 — p (assumed to have two copies), equation (1) becomes 


Nocus 


AmuRp 
{= 
Pocus +2(1 —p) 
Hence, allele-specific copy number estimates for point mutations and indels can be 
obtained as 


1 
Amut =f 5 Pmocus + 2(1— p)) (2) 


In equation (2), f, = r/R is the frequency of mutated reads observed in the sequen- 
cing data, and p and mocys can be obtained from the ASCAT copy number analysis. 

These copy number estimates of mutations were used to determine which 
mutations are likely subclonal: if m,4,= 0.8, the mutation is called likely clonal 
and if Mut < 0.8, the mutation is called likely subclonal. 

In the case of indels, reads with an insertion or deletion may not map as well as 
reads without insertions and deletions. Therefore, a procedure was followed to 
estimate f, for indels that was independent of ease of mapping. Reads were 
obtained by matching flanking sequence (10 bp on each side) around the indel, 
further filtered to exclude spurious matches. The mutated read frequency was 
subsequently calculated, accounting for the difference in sequence lengths with 
and without the indel: 


Tindel / (Is aan lindet + 1) (3) 
Tindet / (Is as Lindel + 1) + Tnormal / (Is — hhormal ah 1) 


In equation (3), inde ANd Formal are the respective numbers of reads with and 
without the indel, /, is the read length (76 bp), and [inge1 and [normal are the respec- 
tive lengths of the matching fragment in sequences with and without the indel. 

Detection of selection and oncogenicity in protein-coding genes. The overall 
significance of an excess of non-silent mutations was determined using the 
methods previously described”. The ranking of gene significances was determined 


©2012 Macmillan Publishers Limited. All rights reserved 


LETTER 


using the following model. We let si, denote the number of silent mutations, where k 
indexes mutation type (CCG— AeT, CeG>TeA, CeG>GeC, TeA— AcT, 
TeA— CG or TeA— GeC) in gene g, where i = 1 for the primary screen and i = 2 
for the follow-up screen. We also have counts m/,, and ni, of missense and nonsense 
mutations, respectively. Finally we have counts I of indels. The numbers of screened 
bases, Ske» Mig and N; ke? in each gene for each mutation type were also calculated. The 
total number of screened bases was L,. We let p, represent the per-base passenger 
mutation prevalence and use y to denote the per-base passenger rate of indels. 

Next we assume that genes can be neutral to cancer, oncogenically triggered by 
missense mutations or inactivated by truncating mutations. Genes are not pre- 
cluded from belonging to both of the last two categories. We assume that propor- 
tions ~ and f of genes belong to the missense group and truncating group, 
respectively. Genes that belong to these groups have mutation rates that increase 
by factors 7 and yu, respectively. These terms quantify the selection pressure for 
missense and truncating variants, respectively. This results in a mixture model 
with the following likelihood: 


: elds pial 
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=TL Pog (SiePi) | >) %mlLP On: (Mig Pin) 
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Here a, =1—0,%=4%,8,=1—, B2=f,4,=1, 4=1, fy =1 and wp =p, and 
Po,(r) indicates the Poisson probability of obtaining value c from a Poisson process 
with rate parameter r. Gp denotes the set of genes in the follow-up study. The 
parameters for this model were then estimated with the expectation—-maximization 
algorithm. Confidence intervals for these parameters were obtained using parametric 
bootstrapping. Conditional on these parameter estimates, we can then use Bayes’ law 
to calculate the probability that each gene belongs to the neutral, the missense or the 
truncating group. Specifically, ify, Wg © {1,2} index whether the gene g does or does 
not belong to the missense or truncating group, respectively, we have 


Pr(g, =m, =n) 
ccPop (L444, )ETPOn (Ske Px)POnt, (Mie Px?n) 
x POn! (Nig Pin) 
x Pop (E27, )ETPow (See P)PO yp, (Me Pun) 


x Pone (Nig Pibln) 


The probability of belonging to either the missense or the truncating group, 
ts Pr(b, =1, We = 1), was then used to rank the genes. 

Generalized linear models. Generalized linear models (GLMs) are extensions to 
ordinary linear regression that model underlying distributions using members of 
the exponential family”*. The response variable is related to the linear model by a 
link function using maximum-likelihood estimates of the parameters. Because 
they are not restricted to modelling normally distributed data, GLMs have par- 
ticular utility in modelling count data such as, in this manuscript, the number of 
mutations. 

If mutations were generated by a random process, with a constant probability of 
occurring at any point throughout an individual's life, we would expect the num- 
ber of mutations to have a Poisson distribution, dependent only on the (unknown) 
rate of mutation and the age of the individual. Where goodness-of-fit tests indi- 
cated that the Poisson distribution was an appropriate model for the number of 
mutations, we used this distribution. However, in the models where goodness-of- 
fit tests indicated that mutation numbers were overdispersed, we used negative 
binomial distributions in place of Poisson distributions, as the negative binomial 
distribution incorporates an additional parameter that allows the adjustment of 
the variance of the distribution independently of its mean. 

GLMs were implemented using the glm and glm.nb functions in R. The pre- 

dictor variables were {age, tumour grade, tubule score, pleomorphism score, 
mitotic score, mitotic count}, each of which was used within a two-factor model, 
with oestrogen receptor status as the second predictor variable. The response 
variable was the number of mutations of a particular type, from the set {substitu- 
tions + indels, substitutions, indels, copy number amplifications, C+ T at CpG 
mutations, all driver mutations}. 
Evaluation of strand bias in tumours displaying the mutator phenotype. To 
assess whether there was a strand bias of C+ X (CT, CG and C= A) 
mutations in PD4120 and the other tumours showing the mutator phenotype, 
we first estimated the expected ratio of cytosines found in transcribed and untran- 
scribed strands, by random sampling of 20,000 CCDS exons from Ensembl version 
61. A y-squared test was then used to examine whether the C— X mutations 
observed in each sample differed significantly from this ratio. Similar tests were 
conducted on the combined mutations from all mutator phenotype samples and 
on all mutator phenotype samples except PD4120. 
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Breast carcinoma is the leading cause of cancer-related mortality in 
women worldwide, with an estimated 1.38 million new cases and 
458,000 deaths in 2008 alone’. This malignancy represents a 
heterogeneous group of tumours with characteristic molecular fea- 
tures, prognosis and responses to available therapy” *. Recurrent 
somatic alterations in breast cancer have been described, including 
mutations and copy number alterations, notably ERBB2 amplifica- 
tions, the first successful therapy target defined by a genomic aber- 
ration’. Previous DNA sequencing studies of breast cancer genomes 
have revealed additional candidate mutations and gene rearrange- 
ments*"°. Here we report the whole-exome sequences of DNA from 
103 human breast cancers of diverse subtypes from patients in 
Mexico and Vietnam compared to matched-normal DNA, together 
with whole-genome sequences of 22 breast cancer/normal pairs. 
Beyond confirming recurrent somatic mutations in PIK3CA", 
TP53°, AKTI'’, GATA3"* and MAP3K1", we discovered recurrent 
mutations in the CBFB transcription factor gene and deletions of its 
partner RUNXI1. Furthermore, we have identified a recurrent 
MAGI3-AKT3 fusion enriched in triple-negative breast cancer 
lacking oestrogen and progesterone receptors and ERBB2 expres- 
sion. The MAGI3-AKT3 fusion leads to constitutive activation 
of AKT kinase, which is abolished by treatment with an ATP- 
competitive AKT small-molecule inhibitor. 

Breast cancers are classified according to gene-expression subtypes: 
luminal A, luminal B, Her2-enriched (Her2 is also known as ERBB2), 
and basal-like’*. Luminal subtypes are associated with expression of 
oestrogen and progesterone receptors and differentiated luminal epi- 
thelial cell markers. The subtypes differ in genomic complexity, key 
genetic alterations and clinical prognosis**'*. To discover genomic 
alterations in breast cancers, we performed whole-genome and 
whole-exome sequencing of 108 primary, treatment-naive, breast 
carcinoma/normal DNA pairs from all major expression subtypes 
(Table 1 and Supplementary Tables 1-3), 17 cases by whole-exome 
and whole-genome sequencing, 5 cases by whole-genome sequencing 
alone, and 86 cases by whole-exome sequencing alone. 

In total, whole-exome sequencing was performed on 103 tumour/ 
normal pairs, 54 from Mexico and 49 from Vietnam, targeting 189,980 
exons comprising 33 megabases (Mb) of the genome and with a median 


of 85.1% of targeted bases covered at least 30-fold across the sample set. 
This analysis revealed a total of 4,985 candidate somatic substitutions 
(see https://confluence.broadinstitute.org/display/CGATools/MuTect 
for methods and data sets) and insertions/deletions (indels, see https:// 
confluence.broadinstitute.org/display/CGATools/Indelocator for methods) 
in the target protein-coding regions and the adjacent splice sites, 
ranging from 14 to 307 putative events in individual samples (Sup- 
plementary Table 4). These mutations represented 3,153 missense, 
1,157 silent, 242 nonsense, 97 splice site, 194 deletions, 110 insertions 
and 32 other mutations (Supplementary Table 5). The total mutation 
rate was 1.66 per Mb (range 0.47-10.5) with a non-silent mutation rate 
of 1.27 per Mb (range 0.31-8.05), similar to previous reports in breast 
carcinoma®’. The mutation rate in breast cancer exceeds that of 
haematologic malignancies and prostate cancer, but is significantly 
lower than in lung cancer and melanoma’®’*’. The most common 


Table 1 | Sample collections successfully completed sequencing and 
analysis 


Patients Mexico N = 56 Vietnam N = 52 
Median age (range) 54 (37-92) 48 (31-81) 
Source of normal DNA Blood Adjacent tissue 
Pathology subtype (percent) 
Ductal 46 (82%) 41 (79%) 
Lobular 4 (7%) 0 (0%) 
DCIS. 0 (0%) 9 (17%) 
Other 6 (11%)* 2 (4%)* 
Stage 
e) 0 (0%) 9 (17%) 
| 8 (14%) 3 (6%) 
Il 36 (64%) 31 (60%) 
Ul 12 (21%) 9 (17%) 
Expression subtype (per cent)t 
Luminal A 24 (43%) 14 (27%) 
Luminal B 13 (23%) 9 (17%) 
Her2 9 (16%) 12 (23%) 
Basal 5 (9%) 8 (15%) 
Unknown 2 (4%) 3 (6%) 
Normal-like 3 (5%) 6 (11%) 


* Includes tubular carcinoma, medullary carcinoma, mucinous carcinoma and mixed carcinoma (3). 
+ Includes mucinous carcinoma (2). 

{Based on PAM-50 classification. 

DCIS, ductal carcinoma in situ. 
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Figure 1 | Most significantly mutated genes in breast cancer as determined 
by whole-exome sequencing (n= 103). Upper histogram, rates of sample- 
specific mutations (substitutions and indels). Green, synonymous; blue, non- 
synonymous. Left histogram, number of mutations per gene and percentage of 
samples affected (colour coding as in upper histogram). Central heat map, 
distribution of significant mutations across sequenced samples (‘Other 


mutation events observed are C to T transition events in CpG dinu- 
cleotides (Fig. 1 and Supplementary Fig. 4). 

We performed validation experiments on 494 candidate mutations 
(representing all significantly mutated genes and genes in significantly 
mutated gene sets) using a combination of mass-spectrometric geno- 
typing, 454 pyrosequencing, Pacific Biosciences sequencing and 
Illumina sequencing of matched formalin-fixed paraffin-embedded 
tissue, and confirmed the presence of 94% of protein-altering point 
mutations (Supplementary Table 4 and Supplementary Fig. 5); this 
validation rate is consistent with previous results that 95% of point 
mutations can be validated with orthogonal methods'*”’. Only 18 out 
of 39 (46%) indels among significantly mutated genes were confirmed. 

Six genes were found to be mutated with significant recurrence in 
the 103 whole-exome sequenced samples, by analysis with the MutSig 
algorithm'*’” (https://confluence.broadinstitute.org/display/CGATools/ 
MutsSig) at a false discovery rate (FDR) < 0.1 after correction for mul- 
tiple hypothesis testing (Supplementary Table 6a), manual review of 
reads, and subsequent orthogonal confirmation of somatic events 
(Fig. 1 and Supplementary Fig. 6). One gene, CBFB, is identified for 
the first time as a significantly mutated gene in breast cancer or any 
other epithelial cancer, to our knowledge, whereas the other five genes 
(TP53, PIK3CA, AKT1, GATA3 and MAP3K1) have previously been 
reported as mutated in breast cancer”’*”*. This significantly mutated 
genes list, as any list produced by a statistical method, is probably 
incomplete and reflects the statistical power of our cohort size—larger 
sample sets will provide further statistical power. 

Somatic mutations in TP53 and PIK3CA were each present in 27% 
of samples, consistent with published frequencies'®”° (Fig. 1). TP53 
mutations occur in samples with a higher mutation rate (t-test 
P= 0.0079 comparing samples with mutation rates greater than or 
less than the median 1.66 mutations per Mb) and were distributed 
across the gene in sites reported in COSMIC (http://www.sanger.ac. 
uk/genetics/CGP/cosmic/). Also, using the ABSOLUTE algorithm for 
determining allele-specific copy number”’, we observed that 21 out of 
31 TP53 mutations were homozygous (Supplementary Table 4). 
PIK3CA mutations were clustered in the helical (amino acids 542/ 
545; 40%) and kinase domains (amino acid 1047; 47%)”. Six samples 
harboured the AKT1 E17K mutation that alters the pleckstrin- 
homology (PH) domain and leads to activation of the kinase’. 
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non-synonymous’ mutations: nonsense, indel and splice-site). Right histogram, 
—log;9 score of MutSig q value. Red line at q = 0.1. Lower chart: top, rates of 
non-silent mutations within categories indicated by legend; bottom, key 
molecular features of samples in each column. DCIS, ductal carcinoma in situ; 
Duct., infiltrating ductal carcinoma; Lob., infiltrating lobular carcinoma; Lum, 
luminal. 


AKTI and PIK3CA mutations, which activate the phosphatidylinositol- 
3-kinase (PI3K) pathway, were mutually exclusive in our data set. 
MAP3K1, recently reported as mutated in oestrogen-receptor-positive 
breast cancers’’, harboured five mutations in three patients with 
oestrogen-receptor-positive disease, and followed a pattern consistent 
with positive selection for recessive inactivation of the gene. In total, 
two frameshift, two nonsense and one missense mutation, combined 
with a homozygous deletion spanning the coding region were 
observed. Although the point mutations seemed to be heterozygous 
by copy-number analysis, two patients harboured dual mutations, 
consistent with compound heterozygous inactivation, although con- 
firmatory phasing data were not available. The GATA3 transcription 
factor gene harboured mutations in four patients with luminal 
tumours, including three previously unknown frameshift mutations 
near the 3’-end of the coding sequence. We also identified one previ- 
ously described splice-site mutation that disrupts zinc-finger domains 
in GATA3 required for DNA binding”. 

CBFB, encoding the core-binding-factor beta subunit, was mutated 
in four oestrogen-receptor-positive samples, with one nonsense muta- 
tion and three truncating frameshift mutations (Fig. 2a). CBFB somatic 
mutations have been noted in isolated cases of breast cancer®””. This is 
the first report of these mutations recurring at a significant rate above 
background; the sample size is not sufficient to determine whether 
these mutations are specific for oestrogen-receptor-positive subtypes. 
CBFB encodes the non-DNA-binding component of a heterodimeric 
protein complex, together with the DNA-binding RUNX proteins 
encoded by RUNX1, RUNX2 and RUNX3. Copy-number analysis, 
using the ABSOLUTE algorithm”’, provides further evidence for loss 
of function of the RUNX1/CBFB complex in breast cancer: the cases 
with CBFB mutations seem to have hemizygous deletions of one par- 
ental allele, whereas two additional cases harbour homozygous dele- 
tions of RUNX1 (Fig. 2b, c and Supplementary Figs 7 and 8). 
Oncogenic rearrangements of RUNX1 or CBFB are common in acute 
myeloid leukaemia*”’ (including the CBFB-MYH11 translocation 
believed to have dominant negative function”). This is to our know- 
ledge the first report of inactivation of this transcription factor com- 
plex in epithelial cancers. 

Significance analysis restricted to somatic mutations in genes 
reported in COSMIC revealed three significantly mutated genes, 
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Figure 2 | CBFB mutations and RUNX1 deletions. a, CBFB coding region 
diagram, RUNX-binding domain in green. Mutations identified in this study 
(red bullets), previously identified mutations®’® (black bullets), and known 
CBFB-MYH11 fusion indicated. b, Allelic copy ratios for the 3-Mb region 
surrounding RUNX1 in samples BR-M-045 and BR-M-174. Dots indicate 
copy-ratios for individual SNP alleles. Red, higher copy-ratio allele for 
informative SNPs that are heterozygous in matched normal DNA; blue, 


PIK3CA, TP53 and ERBB2, the latter below the significance threshold 
in the complete analysis (Supplementary Table 7). ERBB2 contained 
somatic mutations in three samples, with two being identical S310F 
mutations (these two samples are distinct on the basis of their germline 
and somatic genotypes). The S310F mutation can activate ERBB2 and 
is transforming in vitro (personal communication from H. Greulich). 
Neither sample with the S310F activating mutation has ERBB2 
amplification (Supplementary Fig. 9). The two samples belong to the 
Her2-enriched and luminal B subtypes, which typically have ERBB2 
amplification; this supports the notion that the observed mutations 
have a driving role in these tumours’. 

To identify candidate genomic rearrangements, we applied the 
dRanger algorithm’®”” to the 22 cases with paired tumour/normal 
whole-genome sequencing data (Supplementary Table 8). The rate 
of rearrangements ranged from a median of 30 rearrangements per 
sample in the luminal A subtype (range 0-218) to the basal-like and 
Her2-enriched subtypes with a median of 237 and 246 rearrange- 
ments, respectively (Supplementary Fig. 10); the rates are similar to 
a recent report’’. We performed polymerase chain reaction (PCR) 


BR-M-045 


suequunu-Adoo 41 M1OSEV 


380 0.04 0.08 0.12 


suequunu-Adoo 31 M1O0SaVv 


38 0 0.04 0.08 0.12 
Genomic fraction 


lower-copy ratio SNPs; grey, uninformative SNPs (homozygous in matched 
normal). Lines indicate inferred segmental copy-ratios. Red, higher-copy 
segment; blue, lower-copy segment; purple, equal-copy segment. c, Histogram 
depicting bins of segmented copy number (y axis), with inferred integral copies 
shown by dotted lines; the length of each horizontal block corresponds to the 
fraction of the haploid genome at the copy number level, or ‘genomic fraction’ 
(x axis). 


amplification on a subset of the candidate rearrangements (Sup- 
plementary Methods) and confirmed 89 out of 165 events (54%). No 
rearrangement was seen in more than one sample (Supplementary 
Table 8). In addition, we did not identify rearrangements previously 
observed by DNA sequencing” nor by complementary DNA (cDNA)- 
sequencing, including MAST and NOTCH family-gene fusions”. 
The discovery of recurrent driver rearrangements in other epithelial 
cancers”®”’ led to a closer examination of the list of confirmed re- 
arrangements. In a triple-negative, basal-like subtype tumour, we 
observed a rearrangement between the genes MAGI3 (membrane- 
associated guanylate kinase, WW and PDZ domain containing 3) on 
chromosome 1p and AKT3 (v-akt murine thymoma viral oncogene 
homologue 3) on chromosome 1q, resulting in a balanced transloca- 
tion from intron 9 in MAGJ3 to intron 1 of AKT3 (Fig. 3a). The 
previously unknown fusion genes were confirmed in tumour DNA 
by sequencing the product of PCR amplification (Fig. 3b). The 
MAGI3 disruption is complemented by a hemizygous deletion of the 
other allele (Supplementary Fig. 11a). The expression levels of indi- 
vidual exons of MAGI3 and AKT3 correspond to the predicted 
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Figure 3 | MAGI3-AKT3 fusion gene. a, Diagram of balanced translocation 
between MAGI3 and AKT3. b, Top, genomic DNA PCR for AKT3, MAGI3 and 
both fusion products in tumour (T) and normal (N). Bottom, cDNA PCR of 
fusion gene in tumour. c, Above, MAGI3 and AKT3 protein domains; below, 
putative fusion protein. d, Immunoblots of lysates from ZR-75 cells transfected 
with vector, MAGI3-AKT3 fusion, or AKT1 E17K mutant, grown in low-serum 
media, for the indicated antibodies. Left, infected cells with and without insulin 
growth factor 1 (IGF-1) stimulation; right, treatment of vector or MAGI3- 
AKT3 overexpressing cells with AKT inhibitors MK-2206 and GSK-690693. 
e, Focus formation assays with Rat-1 cells expressing pLX control or MAGI3- 
AKT3, and stained with crystal violet. 


5'-MAGI3-AKT3-3' fusion (Supplementary Fig. 11b), with this 
sample having the highest AKT3 expression in the data set. 
Expression of the fusion gene was confirmed in the tumour sample 
by PCR amplification of the cDNA (Fig. 3b). 

The rearrangement produces an in-frame fusion gene with a predicted 
MAGI3-AKT3 fusion protein that combines MAGI3 lacking the second 
PDZ domain, reported to bind to PTEN and be required for the inhib- 
itory effect of PTEN on the PI3K pathway”, together with an AKT3 
region that retains an intact kinase domain but has a disruption of the 
pleckstrin homology domain before the glutamate at position 17 (Fig. 3c). 
AKT3 shares significant homology to AKT and is reported to be the 
dominant AKT family member expressed in hormone-receptor-negative 
breast cancers”’. Together, the MAGI3-AKT3 translocation and deletion 
of MAGI3 could result in the combined loss of function of a tumour 
suppressor gene (PTEN) and activation of an oncogene (AKT3). 

To evaluate oncogenic activity of the MAGI3-AKT3 fusion, we 
expressed the fusion gene ectopically in ZR-75 cells. The MAGI3- 
AKT3 fusion protein is constitutively phosphorylated at serine 473 
in the AKT3 kinase domain (numbered according to the wild-type 
protein) in the absence of growth factors (Fig. 3d); ectopically 
expressed AKT1 with an engineered E17K mutation is likewise con- 
stitutively phosphorylated (Fig. 3d), as previously reported'?. Con- 
stitutive activation of the MAGI3-AKT3 kinase in turn activates 
downstream pathways as demonstrated by phosphorylation of 
GSK3B, an AKT substrate (Fig. 3d). Phosphorylation of GSK3B by 
the MAGI3-AKT3 fusion can be inhibited with an ATP-competitive 
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small molecule AKT inhibitor, GSK-690693, but not with an allosteric 
AKT inhibitor, MK-2206, that interacts with the PH domain of AKT 
(Fig. 3d). Overexpression of the MAGI3-AKT3 fusion gene in Rat-1 
fibroblast cell lines led to loss of contact inhibition and focus formation 
(Fig. 3e). 

We screened 235 additional breast cancer samples for the presence 
of the 5’-MAGI3-AKT3-3’ fusion event by PCR with reverse tran- 
scription (RT-PCR) of cDNA followed by Sanger sequencing of break- 
points. The fusion was present in 8 of the 235 samples, including 5 out 
of 72 triple-negative (oestrogen-receptor-, progesterone-receptor- and 
Her2-negative) samples (Supplementary Fig. 12). 

The power provided by whole-genome and whole-exome sequen- 
cing of a relatively large and diverse breast cancer sample set has 
enabled several significant discoveries, including the identification of 
recurrent inactivating mutations in CBFB and ofa recurrent transloca- 
tion of MAGI3-AKT3. The mutations in CBFB, RUNX1 and GATA3 
suggest the importance of understanding epithelial cell differentiation 
and its regulatory transcription factors in breast cancer pathogenesis. 
The recurrent genomic fusion involving AKT3 suggests that the use of 
ATP-competitive AKT inhibitors should be evaluated in clinical trials 
for the treatment of fusion-positive triple-negative breast cancers, a 
subtype where limited therapeutic options exist beyond systemic cyto- 
toxic chemotherapy. 


METHODS SUMMARY 


All samples were obtained under institutional IRB approval and with documented 
informed consent. Breast cancer specimens from Mexico were paired with peri- 
pheral blood normal DNA whereas the Vietnamese samples were paired with DNA 
from normal adjacent breast tissue. Tumour RNA for each case was analysed on 
exon arrays to determine breast cancer expression subtype using the PAMS0 clas- 
sification method, whereas tumour/normal DNA pairs were analysed for copy 
number, allelic imbalance, and ancestry using single nucleotide polymorphism 
(SNP) arrays. A total of 108 samples, 17 both whole-genome sequencing and 
whole-exome sequencing, 86 whole-exome sequencing only, and 5 whole-genome 
sequencing only, passed initial qualification metrics, library construction, and suc- 
cessfully achieved desired sequencing depth (100 whole-exome sequencing; 30 
whole-genome sequencing) on the Illumina sequencing platform (Supplementary 
Figs 1-3, Supplementary Tables 2 and 3). Tumour-specific point mutations, small 
insertions/deletions (indels), and rearrangements were detected by comparing 
tumour DNA to its paired normal DNA and using a series of algorithms to identify 
somatic events (Supplementary Fig. 2)'*'”. Additional mutation calling was per- 
formed separately on tumour and normal DNA to identify germline mutation 
events that may confer susceptibility to breast carcinoma. Allele-specific copy 
number of each gene/mutation was determined using the HAPSEG and 
ABSOLUTE analysis methods. Confirmation of point mutations and indels was 
performed using mass-spectrometry-based genotyping and orthogonal next- 
generation sequencing methods, whereas putative in-frame genomic rearrange- 
ments were PCR-amplified from DNA to confirm the presence of the event. 

A complete description of the materials and methods is provided in the 
Supplementary Information. Access to the data and computational algorithms 
used in this study can be found at https://confluence.broadinstitute.org/display/ 
CGATools/Home. 
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Astrocyte glypicans 4 and 6 promote formation of 
excitatory synapses via GluAl AMPA receptors 


Nicola J. Allen'+, Mariko L. Bennett’, Lynette C. Foo!, Gordon X. Wang’, Chandrani Chakraborty’, Stephen J. Smith? 


& Ben A. Barres! 


In the developing central nervous system (CNS), the control of 
synapse number and function is critical to the formation of neural 
circuits. We previously demonstrated that astrocyte-secreted factors 
powerfully induce the formation of functional excitatory synapses 
between CNS neurons’. Astrocyte-secreted thrombospondins 
induce the formation of structural synapses, but these synapses 
are postsynaptically silent’. Here we use biochemical fractionation 
of astrocyte-conditioned medium to identify glypican 4 (Gpc4) and 
glypican 6 (Gpc6) as astrocyte-secreted signals sufficient to induce 
functional synapses between purified retinal ganglion cell neurons, 
and show that depletion of these molecules from astrocyte- 
conditioned medium significantly reduces its ability to induce 
postsynaptic activity. Application of Gpc4 to purified neurons is 
sufficient to increase the frequency and amplitude of glutamatergic 
synaptic events. This is achieved by increasing the surface level 
and clustering, but not overall cellular protein level, of the GluA1 
subunit of the AMPA (g-amino-3-hydroxy-5-methyl-4-isoxazole 
propionic acid) glutamate receptor (AMPAR). Gpc4 and Gpc6 are 
expressed by astrocytes in vivo in the developing CNS, with Gpc4 
expression enriched in the hippocampus and Gpc6 enriched in the 
cerebellum. Finally, we demonstrate that Gpc4-deficient mice have 
defective synapse formation, with decreased amplitude of excitatory 
synaptic currents in the developing hippocampus and reduced 
recruitment of AMPARs to synapses. These data identify glypicans 
as a family of novel astrocyte-derived molecules that are necessary 
and sufficient to promote glutamate receptor clustering and recep- 
tivity and to induce the formation of postsynaptically functioning 
CNS synapses. 

To understand how astrocytes regulate functional synapse formation, 
we examined postsynaptic function, AMPAR levels and AMPAR 
localization at synapses between purified retinal ganglion cells (RGCs) 
cultured alone or with a feeder layer of astrocytes (Supplementary 
Fig. la—d). Astrocytes strengthen individual excitatory glutamatergic 
synapses in RGCs, as shown by increased frequency and amplitude of 
miniature excitatory postsynaptic currents’ (mEPSCs) (Fig. la-c). In 
RGCs, mEPSCs are mediated purely by AMPARs, composed of com- 
binations of four subunits, GluAl to GluA4, forming tetramers'”. 
Astrocytes do not greatly alter total AMPAR levels in RGCs (except 
for a small significant increase in GluA4), and thus do not induce the 
synthesis of new AMPARs or block the degradation of existing receptors 
(Fig. 1d, e). Astrocytes do, however, increase surface levels of all AMPAR 
subunits on RGCs by a factor of three, as shown by surface biotinylation 
and quantitative western blotting (Fig. 1f, g). Surface staining for GluA1- 
containing AMPARs demonstrated that the increased surface receptors 
are clustered together in puncta throughout the dendrites (Fig. 1h-)). 
These results demonstrate that astrocyte-derived signals lead to 
increased surface levels and clustering of pre-existing AMPARs. 

We previously identified thrombospondins and hevin as astrocyte- 
secreted proteins sufficient to induce structural synapse formation, but 


the synapses so formed are postsynaptically silent because they lack 
AMPARs™. Therefore, we used biochemistry to identify the astrocyte- 
secreted factor that is sufficient to induce functional synapse forma- 
tion. Astrocytes were maintained in minimal medium and the factors 
they secreted collected as astrocyte-conditioned medium (ACM), 
which was concentrated and fed to RGCs, and was sufficient to induce 
functional synapse formation as assessed by electrophysiological 
recording of total synaptic activity and mEPSCs (Supplementary Fig. Ic, 
e-g). Analysis of ACM by two-dimensional electrophoresis revealed 
that hundreds of proteins were present, and size-exclusion experiments 
demonstrated the activity factor to be relatively large, between 100 and 
300 kDa (Supplementary Fig. 2). To narrow down candidate factors 
present in ACM, we conducted affinity column fractionation, initially 
using individual columns and then combining them in series 
(Supplementary Fig. 3). In the final fractionation scheme, the unbound 
proteins were taken from a heparin column, bound and eluted from an 
anion column, and then bound and eluted from a hydrophobic inter- 
action column. This final eluted protein fraction was fed to RGCs, and 
as it was unclear whether the activity factor would be sufficient to 
induce synapses directly, thrombospondin was included to induce 
structural synapse formation. This final fraction was sufficient to 
induce a large increase in synaptic activity (Fig. 1k), contained 1% of 
the starting protein (Supplementary Table 1) and was sixfold enriched 
for functional activity. This fraction was analysed by mass spectrometry 
and contained approximately 25 candidate factors (Supplementary 
Table 2). 

To identify which candidate protein was sufficient to enhance 
synaptic activity, we overexpressed them in COS-7 cells (which do 
not secrete endogenous synaptogenic factors (Supplementary Fig. 4)) 
and fed the conditioned medium to RGCs along with thrombospondin. 
Most of the candidates lacked activity, whereas medium conditioned by 
Gpc4-expressing COS-7 cells was sufficient to induce a large increase in 
synaptic activity (Supplementary Fig. 4 and Fig. 11). Western blotting of 
conditioned media from RGCs, cultured astrocytes and immuno- 
panned astrocytes (that closely resemble in vivo mature astrocytes”) 
demonstrated that astrocytes and not neurons secrete Gpc4 in vitro 
(Supplementary Fig. 5). Glypicans are a conserved family of heparan 
sulphate proteoglycans, with six members (Gpcl to Gpc6) in mammals, 
and Gpc4 is homologous to Drosophila Dally-like®. The 63-kDa core 
protein is heavily glycosylated and can be in excess of 200 kDa in mass. 
Glypicans are tethered to the extracellular face of the plasma membrane 
of cells by GPI (glycosyl phosphoinositide) linkages’*, which can be 
cleaved by endogenous phospholipases, thus releasing the protein. 

Having demonstrated that Gpc4 is sufficient to enhance total synaptic 
activity, we asked whether this was due to effects on postsynaptic 
strengthening of synapses. In subsequent experiments, RGCs were 
treated with purified Gpc4 (Supplementary Fig. 6a, b), in the absence 
of thrombospondin, to assess specific effects of Gpc4 on synapse 
formation and function. Gpc4 was sufficient to strengthen individual 
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Figure 1 | Astrocyte signals strengthen synapses by recruitment of surface 
AMPARs. a-c, Example mEPSC recordings (a) show that frequency (b) and 
amplitude (c) are significantly increased in RGCs cultured with astrocytes. 
Average mEPSC amplitude: 12.2 + 0.5 pA (RGCs alone, N = 13 cells), 

20.4 + 1.7 pA (astrocytes, N= 14, P< 0.0002). d, e, Astrocytes do not alter 
total AMPAR levels in RGCs. Western blots (d) of RGC lysates for AMPAR 
subunits GluA1 (G1), GluA2 (G2) and GluA4 (G4), and -actin loading 
control, and quantification of band intensity relative to RGC alone (e). N=7 
experiments. f, g, Astrocytes increase surface AMPARs in RGCs. Western blots 
(f) of surface AMPAR subunits GluA1, GluA2 and GluA4, and -actin loading 
control from total lysate (same experiment as d), and quantification of band 
intensity relative to RGC alone (g). N = 16 experiments. h-j, Astrocytes cluster 
GluA1-containing AMPARs on the RGC surface. Example images (h) of 
surface GluA1 (green) and RGC processes (red); bottom panel, enlargement of 
GluA1 (boxed white). Quantification of number (i) and size (j) of GluA1 
clusters. N = 10 experiments. k, Total synaptic activity induced by final protein 
fraction from column fractionation of ACM. I, Total synaptic activity in RGCs 
cultured in COS-7-cell-conditioned medium (COS-7 CM) transfected with a 
control protein (green fluorescent protein (GFP)) or Gpc4, plus 
thrombospondin (TSP). *P < 0.05, **P < 0.01, ***P < 0.001; error bars, s.e.m. 


synapses, as shown by increased mEPSC frequency and amplitude, 
although the very large-amplitude events induced by astrocytes were 
absent (Fig. 2a-d and Supplementary Fig. 7). To determine whether 
increased synaptic activity was due to increased numbers of AMPARs 
on the cell surface, we isolated surface receptors and analysed them by 
western blotting. Gpc4 induced a 2.5-fold increase in surface GluA1, 
comparable to that observed in the presence of astrocytes; however, 
there was less of an increase in surface levels of the other subunits 
(Fig. 2e, fand Supplementary Fig. 9). Thus, Gpc4 specifically recruits 
GluA1-containing AMPARs, and astrocytes release additional factors 
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Figure 2 | Gpc4 is sufficient to strengthen glutamatergic synapses and 
increase surface GluAl-containing AMPARs. a-d, Example mEPSC 
recordings (a) show that frequency (b) and amplitude (c) are significantly 
increased in RGCs cultured with Gpc4. Average traces aligned by rise time 
(d). Average mEPSC amplitude: 13.8 + 0.7 pA (RGC alone, N = 16 cells), 
20.0 + 1.2 pA (astrocyte, N = 16, P< 0.05), 17.8 + 1.4pA (Gpc4, N = 13, 
P<0.05).e, f, Gpc4 increases surface GluAl AMPARs in RGCs. Western blots 
(e) of surface AMPAR subunits GluA1, GluA2 and GluA4 and neuron-specific 
enolase (NSE) loading control from total lysate, and quantification of band 
intensity relative to RGC alone (f). For full blot results, see Supplementary Fig. 
9. N= 6 experiments for GluA1 and GluA2; N = 3 for GluA4. g-i, Gpc4 
clusters GluA1-containing AMPARs on the RGC surface. Example images 
(g) of surface GluA1 (green) and RGC processes (red); bottom panel, 
enlargement of GluA1 (boxed white). Quantification of number (h) and size 
(i) of GluA1 clusters. N = 7 experiments. j, k, Gpc4 induces structural synapses. 
Example images (j) of presynaptic (bassoon, red) and postsynaptic (homer, 
green) staining; bottom panels, enlargements of the respective markers (boxed 
white). Quantification of synapse number (k) (co-localization of pre- and 
postsynaptic puncta). N = 6 experiments. *P < 0.05; error bars, s.e.m. 


that bring GluA2, GluA3 and GluA4 AMPAR subunits to the synapse. 
Surface staining for GluA1 revealed a 2.5-fold increase in the number 
of receptor clusters on RGCs exposed to Gpc4, and a 20% increase in 
size, comparable to astrocytes (Fig. 2g—i). Gpc4-induced clustering of 
GluAl on RGCs is dose dependent, being effective at 0.1-10nM 
(comparable to levels in ACM) and ineffective at higher concentrations 
(Supplementary Fig. 6c, d). These experiments demonstrate that Gpc4 
is sufficient to strengthen pre-existing synapses by increasing mEPSC 
amplitude; however, we also observed an increase in the number of 
mEPSCs and GluA1 clusters on RGCs, suggesting that Gpc4 can induce 
new structural synapses. We assessed synapse number by counting co- 
localization of pre- and postsynaptic markers, and observed a signifi- 
cant threefold increase in synapse number in RGCs treated with Gpc4 
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compared with RGCs grown alone (Fig. 2j, k). This demonstrates that 
Gpc4 is synaptogenic, although to a lesser degree than astrocytes. 
The glypicans are a gene family with six members in mouse and 
human; Gpc6 is the most homologous to Gpc4 (ref. 9). We identified 
both Gpc4 and Gpc6 in the fractionation positive fraction (Supplemen- 
tary Table 2), so we assessed whether Gpc6 is also synaptogenic. Gpc6 
was sufficient to recruit GluAl to the neuronal surface, and also 
induced structural synapse formation to the same extent as Gpc4 
(Supplementary Fig. 8). To determine whether Gpc4 and Gpcé6 are 
necessary for ACM to enhance postsynaptic activity, we used short 
interfering RNA (siRNA) to reduce expression of both in astrocytes, 
and confirmed protein reduction in ACM by western blotting (Sup- 
plementary Fig. 10a). ACM with reduced Gpc4 and Gpcé was unable to 
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Figure 3 | Gpc4 and Gpc6 are necessary for ACM to cluster surface GluA1, 
and mechanism of action. a-c, Reduction of Gpc4 and Gpcé levels in ACM 
reduces its ability to increase mEPSC amplitude in RGCs. Example mEPSC 
recordings (a), cumulative amplitude plot (b) and average traces aligned by rise 
time (c). Average mEPSC amplitude: 14.6 + 0.6 pA (RGC alone, N = 8 cells), 
20.6 + 0.9 pA (siRNA control ACM, N= 10, P< 0.05), 16.0 = 1.0pA (siRNA 
Gpc4 and Gpc6 ACM, N = 11, P= 0.3). d, Reduction of Gpc4 and Gpcé levels 
in ACM prevents GluA1 surface clustering, which is rescued by expression of 
siRNA-resistant Gpc4. N = 30 cells per condition. e, Reduction of Gpce4 and 
Gpcé6 levels in ACM does not prevent ACM-induced structural synapse 
formation. N = 3 experiments. f, Time course of Gpc4-induced surface 
clustering of GluA1 shows 18h of treatment is required to increase surface 
GluA1 levels. Dashed line, 6-d data from Fig. 2h. N = 3 experiments (4h), 4 
(18h). g, Gpc4 does not rapidly induce structural synapse formation and 
requires 3 d of treatment. N = 5 experiments (1d), 3 (2d), 6 (3d), 3 (6d). 

h, Surface clustering of GluA1 is necessary for Gpc4-induced synapse 
formation, shown by the inability of Gpc4 to induce synapse formation in RGCs 
lacking GluA1. N= 5 experiments. *P < 0.05, **P < 0.0001; error bars, s.e.m. 
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significantly increase mEPSC amplitude in RGCs (Fig. 3a—c) and 
unable to cluster GluAl on the neuronal surface (Fig. 3d and 
Supplementary Fig. 10c, d). Overexpression of the human form of 
Gpc4, which is resistant to the siRNA, rescued the ability of ACM to 
cluster GluA1 receptors on the neuronal surface (Fig. 3d). The use of 
siRNA against individual glypicans was not as effective in reducing 
the ability of ACM to enhance total synaptic activity (Supplementary 
Fig. 10b). ACM with reduced levels of Gpc4 and Gpc6 still induced a 
significant increase in structural synapse formation, presumably owing 
to the presence of other synaptogenic proteins such as thrombospondin 
and hevin** (Fig. 3e). These results show that Gpc4 and Gpc6 are 
necessary components of ACM for enhancing postsynaptic activity, 
and that there is functional redundancy between family members. 

To examine the time course of Gpc4 effects, we assayed surface 
clustering of GluAl after 4h and 18h of treatment. There was no 
increase in receptor recruitment after 4h, but after 18h there was a 
significant increase in surface GluA1, to the same level as after 6d 
(time point previously used; Fig. 3f and Supplementary Fig. 11a, b). 
Thus, Gpc4 does not immediately capture GluAl AMPARs on the 
surface of the cell, which suggests that downstream signalling cascades 
are involved. We examined structural synapse formation at 4h, 18h, 
1d, 2d, 3d and 6d after Gpc4 addition, and found that synapse 
number only significantly increased after 3 d (Fig. 3g and Supplemen- 
tary Fig. llc, d). Therefore, Gpc4 first clusters GluAl-containing 
AMPARs on the cell surface, and only then recruits postsynaptic 
scaffolding molecules. This suggests that clustering of GluA1 is a 
necessary step in Gpc4-induced synapse formation. To test this 
hypothesis, we used siRNA to decrease GluAl in RGCs, and asked 
whether Gpc4 could still induce structural synapse formation 
(Supplementary Fig. 1le). Whereas astrocytes still increased synapse 
formation in RGCs lacking GluA1, Gpce4 did not (Fig. 3h). This mech- 
anism of synaptogenesis is distinct from that of thrombospondin, 
which induces structural synapses that are postsynaptically silent’. 
Thus, astrocytes can induce excitatory synapse formation by at least 
two distinct mechanisms (Supplementary Fig. 19). 

Because they are heparan sulphate proteoglycans, glypicans can 
interact with signalling receptors and morphogens (for example 
Wnt, fibroblast growth factor, Hedgehog and bone morphogenetic 
protein) via their sugar chains’®. We generated a ‘mutant’ form of 
Gpc4 lacking glycosylation sites to investigate whether glycosylation 
is necessary for the synaptogenic effects of Gpc4 (Supplementary Fig. 12a). 
Glycosylation-deficient Gpc4 could not cluster GluA1 or induce struc- 
tural synapse formation (Supplementary Fig. 12b, c and Supplemen- 
tary Fig. 6d). Comparison by mass spectrometry of co-purified factors 
between full and mutant Gpc4 produced by HEK cells showed few 
differences between them, and these factors were not detected when 
Gpcé4 was purified from astrocytes (data not shown), which can induce 
an increase in synaptic activity (Fig. 2a-d). We nevertheless tested a 
number of known glypican interactors for their ability to cluster 
surface GluAl on RGCs and found none to be effective (Supplemen- 
tary Fig. 12d). Thus, modification of Gpc4 by heparan sulphate is 
necessary for its synaptogenic activity, and this may be due to the 
necessity of heparan sulphate for structural interaction with a receptor 
rather than delivery of associated morphogens, as has been shown for 
Dally-like binding to its receptor LAR”. 

We next investigated the roles of Gpc4 and Gpcé in developmental 
synapse formation and maturation in vivo. In situ hybridization 
revealed overlapping messenger RNA expression throughout the brain 
during postnatal development, particularly in the cortex, with Gpc4 
enriched in the hippocampus and Gpcé enriched in the cerebellum 
(Fig. 4a and Supplementary Fig. 13a). Prior gene profiling of purified 
forebrain CNS cells showed that messenger RNA for both Gpc4 and 
Gpc6 is highly enriched in astrocytes compared with neurons’? 
(Supplementary Fig. 14a), which we confirmed using quantitative 
real-time PCR of purified astrocytes and neurons from the cortex 
and hippocampus (Supplementary Fig. 14b, c). Combining in situ 
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Figure 4 | Mice deficient in Gpc4 have weaker excitatory synapses in vivo. 
a, In situ hybridization of Gpc4 messenger RNA in P6 mouse hippocampus. 
Left: Gpc4 (purple) is expressed in synaptic regions and co-localizes with 


astrocytes (glial fibrillary acidic protein (GFAP), brown). Right: enlargement of 


boxed region; arrows mark Gpc4-positive astrocytes. b-e, Mice lacking Gpc4 
(KO) have weaker excitatory synapses in CA1 pyramidal neurons at P12. 
Example mEPSC recordings (b), cumulative probability plot of mEPSC 
interevent interval (c) (no significant difference in frequency: wild type, 

0.88 + 0.25 Hz; Gpc4-knockout, 0.55 + 0.08 Hz; P = 0.16), cumulative 
probability plot of mEPSC amplitude (d) (significant decrease in amplitude: 
wild type, 20.67 + 2.16 pA; Gpc4-knockout, 16.07 + 0.96 pA; P< 0.05), 
average traces aligned by rise time (e). N = 9 cells (Gpc4-knockout, five mice), 6 
(wild type, five mice). f-i, Mice lacking Gpc4 recruit fewer GluAl AMPARs to 
synaptic sites in hippocampal CA1 at P12. Example array tomography image 
(f) from wild type; VGLUT1 (red), GluA1 (green), MAGUK (blue), triple co- 
localization (yellow circles). There is a significant decrease in triple co- 
localization of VGLUT1 + MAGUK + GluA1 in Gpc4-knockout mice 

(g). There is no significant difference in structural synapse number (VGLUT1 
+ MAGUK), but there is a significant decrease in GluA1 association with 
MAGUK in Gpc4-knockout mice (h) and no difference in individual synaptic 
markers (i). N= 10 arrays per genotype from four wild-type and four Gpc4- 
knockout mice. *P < 0.05; error bars, s.e.m. 


hybridization for Gpc4 with immunostaining for the astrocyte marker 
GFAP demonstrated that hippocampal astrocytes express Gpc4 at 
early postnatal periods (P6-P14) during the initiation of synapto- 
genesis'’, and that astrocytic expression decreases with maturation 
(P21) and switches to subsets of neurons (Fig. 4a and Supplemen- 
tary Fig. 13b-d). Therefore, developing astrocytes express different 
glypicans with distinct but overlapping regional patterns. 
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To determine the in vivo function of Gpc4, we examined synapse 
formation and function in hippocampal area CA1 of Gpc4-knockout 
mice, and chose this region because Gpc4 is enriched here (Sup- 
plementary Fig. 13a) and Gpc4 and Gpc6 have redundant functions 
in vitro. We recorded mEPSCs from CA1 pyramidal neurons in acute 
hippocampal slices from Gpc4-knockout and wild-type littermate 
controls at P12, during functional synapse formation, and at P24, 
when more mature synapses are present. We observed a significant 
(22%) decrease in the amplitude of mEPSCs at P12, and at P24 a 
significant shift in amplitude distribution remained (Fig. 4b-e and 
Supplementary Figs 15 and 16). These effects were not due to gross 
differences in dendritic architecture or neuronal membrane properties 
(Supplementary Fig. 17), or to changes in total levels of synaptic 
proteins (Supplementary Fig. 18). We used array tomography to 
address whether the decreased mEPSC amplitude was due to a defect 
in structural synapse formation or to reduced recruitment of GluAl 
receptors to synapses in hippocampal area CA1. Functional excitatory 
synapses were classified by triple co-localization of the presynaptic 
vesicular marker VGLUT1 with the postsynaptic density marker 
MAGUK, plus GluA1 at the postsynaptic side (Fig. 4f). There is a 
significant (22%) decrease in this class of synapse at P12 in Gpc4- 
knockout mice (Fig. 4g). This decrease is not due to a difference in 
the number of individual puncta (Fig. 4i) or to less structural synapse 
formation (VGLUT1 + MAGUK is not altered), but to a significant 
decrease in the co-localization of GluAl with MAGUK (Fig. 4h). 
Therefore, the same number of synapses form in Gpc4-knockout mice 
as in wild type, but these synapses recruit fewer GluAl AMPARs, 
probably leading to the observed physiological defect of smaller 
mEPSCs. The amplitude decrease observed is similar to that seen when 
the GluA1 subunit of the AMPAR is removed from CA1 pyramidal 
neurons!*"*, and this is the subunit we have shown to be recruited to 
the surfaces of RGCs by Gpc4 in vitro (Fig. 2e, f). 

Here we have identified glypicans as a novel family of astrocyte- 
secreted proteins that regulate glutamate receptor clustering and 
excitatory synapse formation in neurons. Other secreted factors 
have been described that can induce structural synapses that are post- 
synaptically silent (thrombospondins and hevin**), increase surface 
AMPAR levels (neuronal pentraxins and tumour necrosis factor-« 
(refs 16, 17)) or alter AMPAR mobility and synaptic plasticity 
(extracellular matrix molecules’). Glypicans are the first non- 
neuronal secreted factors to be identified that are sufficient to induce 
functional synapses that cluster pre- and postsynaptic density proteins, 
are postsynaptically active and contain surface AMPARs. We have 
demonstrated that astrocytes increase GluA1 to GluA4 on the surfaces 
of neurons and that Gpc4 and Gpc6 specifically regulate GluA1 surface 
expression, indicating that astrocytes produce multiple factors that 
regulate the surface expression of different AMPAR subunits. 
Notably, specific AMPAR subunits may be used for the developmental 
initiation of synapse formation and also for the strengthening of 
synapses during long-term potentiation, with GluAl used during 
synapse initiation and GluA2 and GluA3 used during synapse 
maturation”. This would be consistent with a role for glypicans in 
initiating nascent synapse formation by recruiting postsynaptic 
GluA1, followed by the action of additional factors that induce synapse 
maturation. Astrocytes may control these events by releasing distinct 
factors, including glypicans and thrombospondin, with spatiotemporal 
specificity, providing a new mechanism by which astrocytes control the 
formation and maturation of neural circuitry. 

An attractive candidate receptor for inducing the actions of Gpc4 
and Gpcé is LAR, a protein tyrosine phosphatase receptor. Drosophila 
Dally-like can bind to and signal through LAR", and removal of LAR 
family members from mammalian hippocampal neurons reduces their 
ability to form synapses and to recruit AMPARs to synapses*’. Mice 
lacking Gpc6 die shortly after birth’ from apparent breathing 
difficulties (N.J.A. and B.A.B., unpublished observations), suggesting 
neural dysfunction, which is reminiscent of the phenotype observed in 
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neuroligin-triple-knockout mice”. In addition, mutations in Gpc6 
have been associated with attention deficit hyperactivity disorder’® 
and neuroticism” in humans, disorders in which synaptic dysfunction 
has been implicated. Thus, the identification of glypicans as regulators 
of functional synapse formation has important implications for the 


formation of appropriate neuronal circuits in development and disease. 


METHODS SUMMARY 


For detailed methods, see Methods. 

Purification and culture of RGCs and astrocytes. RGCs were purified by 
sequential immunopanning to greater than 99% purity from P5-P7 Sprague- 
Dawley rats (Charles Rivers) and cultured in serum-free medium containing 
BDNF, CNTF and forskolin on laminin-coated coverslips at 50,000 cells per well, 
as previously described’. Cortical astrocytes (MD astrocytes) were prepared as 
described in ref. 1. RGCs were cultured for 7-10 d to allow robust process out- 
growth, and were then cultured with astrocyte inserts, ACM, COS-7-conditioned 
media, thrombospondin, Gpc4 or Gpc6 for an additional 6d unless stated 
otherwise. 

Column fractionation procedure. ACM (3.3 mg), collected from 10 X 15cm 
plates of astrocytes, was diluted in ethanolamine buffer and passed over a heparin 
column to which the protein of interest did not bind. The unbound fraction was 
collected and passed over an anion column (Q column), to which the protein of 
interest did bind. The NaCl (salt) concentration was increased to 0.5 M to remove 
irrelevant proteins, and then to 2 M to elute proteins of interest. The 0.5-2 M anion 
column eluate was passed over a hydrophobic interaction column (phenyl (low 
sub)), to which the protein of interest did bind. The bound proteins were eluted by 
decreasing the NaCl concentration from 2 M to 0 M. This final eluate contained 
32 ug protein (Supplementary Table 1) and was sixfold enriched for functional 
activity, and was analysed by mass spectrometry. 

Mice. Gpc4-knockout mice were obtained from Genentech/Lexicon and were 
generated by homologous recombination targeting exon 3, confirmed by 
Southern blotting’. No Gpc4 was detected by western-blotting-conditioned 
media prepared from Gpc4-knockout astrocytes, demonstrating that no functional 
protein was being secreted (Supplementary Fig. 5c). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Purification and culture of RGCs and astrocytes. RGCs were purified by 
sequential immunopanning to greater than 99% purity from P5-P7 Sprague- 
Dawley rats (Charles Rivers) and cultured in serum-free medium containing 
BDNF, CNTF and forskolin on laminin-coated coverslips at 50,000 cells per well, 
as previously described'’. Cortical astrocytes (MD astrocytes) were prepared as 
described in ref. 1. In some experiments, astrocytes were prepared by immuno- 
panning (IP astrocytes), as recently described in ref. 5. RGCs were cultured for 
7-10 d to allow robust process outgrowth, and were then cultured with astrocyte 
inserts, ACM, COS-7-conditioned media, thrombospondin, Gpc4 or Gpcé for an 
additional 6 d unless stated otherwise. 

RGC transfection. RGC transfection with siRNA, along with GFP to mark 
transfected cells, was carried out using Lipofectamine as described in ref. 28. An 
OnTarget Plus siRNA pool against GluAl was obtained from Dharmacon 
(catalogue number L-097755-01). In companion control experiments, RGCs were 
transfected with the same amount of either a targeting control (siCyclophilin B) or 
a non-targeting control (siControl) pool from Dharmacon; results obtained were 
the same with both. 

Mice. Gpc4-knockout mice were obtained from Genentech/Lexicon and were 
generated by homologous recombination targeting exon 3, confirmed by 
Southern blotting. No Gpc4 was detected by western-blotting-conditioned 
media prepared from Gpc4-knockout astrocytes, demonstrating that no functional 
protein was being secreted (Supplementary Fig. 5c). Mice were maintained on a 
mixed genetic background (129/C57bl6/J, crossed three generations onto C57bl6/J), 
because they seemed less viable when repeatedly crossed onto a pure C57Bl16/J 
background (N.J.A, unpublished observations). All experiments were carried out 
using male wild-type and knockout littermates (glypican 4 is on the X chromosome, 
so experiments compared Gpc4*”” with Gpc4 ”). To trace dendrites, Gpc4- 
knockout mice were crossed with Thyl-GFP-M mice obtained from JAX. 
Preparation of ACM and siRNA. Cortical astrocytes were cultured in 10-cm 
plates in serum-containing medium until confluent. Cells were then washed three 
times with warm DPBS to remove serum, and placed in minimal conditioning 
medium for 4d. Conditioning media contained phenol-red-free Neurobasal, 
glutamine, pyruvate, penicillin and streptomycin. ACM was collected and first 
centrifuged to pellet dead cells and debris, then placed in centrifugal concentrators 
(Sartorius) with a size cut-off filter of 5kDa unless otherwise stated. ACM was 
concentrated 50-fold. Protein concentration was determined by Bradford assay, 
and ACM was fed to RGCs at 50-80 pg ml’. This method produced ACM that 
induced synaptic activity when fed to RGCs the majority of the time, as long as 
astrocytes were used soon after isolation and not passaged extensively. 

For siRNA experiments, OnTarget Plus siRNA pools against rat glypican 4 

(catalogue number, L-098055-01) and glypican 6 (catalogue number, L-106892- 
01) were obtained from Dharmacon. Following validation, individual siRNAs 
against glypican 4 (sequence 9) and glypican 6 (sequence 6) were selected and 
used in all experiments (Supplementary Fig. 6). In companion control experi- 
ments, astrocytes were transfected with the same amount of either a targeting 
control (siCyclophilin B) or a non-targeting control (siControl) pool from 
Dharmacon; results obtained were the same with both. In rescue experiments, 
astrocytes were co-transfected with complementary DNA (cDNA) for human 
glypican 4, which is resistant to the siRNA. Confluent astrocytes were put into 
single-cell suspension using trypsin, and Amaxa nucleofection was used to intro- 
duce siRNA into the astrocytes. Astrocytes were plated into RGC growth medium 
minus B27 and growth factors for 24h to recover from the transfection and to 
allow for knockdown of the RNA and subsequent reduction in secretion of the 
proteins of interest. Astrocytes were then washed three times with warm DPBS and 
placed in minimal conditioning medium for 3d before collection of ACM as 
outlined above. Comparable results were seen when Lipofectamine, rather than 
nucleofection, was used to introduce siRNA to astrocytes (data not shown). 
Knockdown of glypican 4 and glypican 6 was validated by western blotting of 
ACM for glypican 4 (Proteintech rabbit anti-glypican 4) and glypican 6 (RandD 
goat anti-glypican 6). RGCs were fed with equal amounts of protein from control 
and siGlypican4+6 ACM, between 30-40 pg ml’, ACM was fed to RGCs in a 
range that was at the low end of the effective dose for control ACM, so that the 
effect of reducing the levels of glypican 4 and glypican 6 in ACM could be 
observed. This was necessary owing to incomplete knockdown of glypican 4 
and glypican 6 in astrocytes, so the levels present in ACM were reduced rather 
than eliminated. 
Two-dimensional gel electrophoresis. ACM was concentrated as outlined above, 
loaded onto an IPG strip, pH 3-10, and first separated by isoelectric point followed 
by separation by mass on an SDS gel following manufacturers’ instructions 
(Biorad). The separated proteins were visualized using silver staining following 
manufacturers’ instructions (Invitrogen). 
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Column fractionation procedure. ACM was prepared as outlined above and 
diluted into column loading buffer appropriate for each column. Columns (GE 
Healthcare) were prepacked and 1 ml in size, and fractionation procedures were 
carried out at room temperature (18-22 °C) in a tissue culture hood to keep the 
ACM sterile, using a syringe to apply manual pressure. Ethanolamine, pH 9.5 
(20 mM), was used as the buffer in all experiments. Initially each column was 
tested in isolation and then the columns were combined in series in the following 
procedure (Supplementary Fig. 3). ACM (3.3 mg), collected from 10 X 15cm 
plates of astrocytes, was diluted in ethanolamine buffer and passed over a heparin 
column to which the protein of interest did not bind. The unbound fraction was 
collected and passed over an anion column (Q column), to which the protein of 
interest did bind. The NaCl (salt) concentration was increased to 0.5 M to remove 
irrelevant proteins, and then to 2 M to elute proteins of interest. The 0.5-2 M anion 
column eluate was passed over a hydrophobic interaction column (phenyl (low 
sub)), to which the protein of interest did bind. The bound proteins were eluted by 
decreasing the NaCl concentration from 2 M to 0M. This final eluate contained 
32 ug protein (Supplementary Table 1) and was sixfold enriched for functional 
activity, and was analysed by mass spectrometry. Column fractions were tested 
by feeding them to RGCs at known protein concentrations, in the presence of 
thrombospondin to induce structural synapses, followed by electrophysiological 
recording of total synaptic activity. Before they were fed to RGCs, the column 
fractions went through a buffer exchange from ethanolamine buffer to 
Neurobasal, to remove high levels of salt. Pierce Zeba desalt spin columns were 
used for buffer exchange following manufacturer’s instructions. 

Mass spectrometry analysis. Both the positive fraction (phenyl column eluate) 
and the negative fraction (phenyl column unbound fraction) were analysed for 
comparison. Proteins were analysed in solution, reduced with DTT and alkylated 
with iodoacetamide, and then digested with trypsin. The liquid peptide mixture 
was passed over an HPLC column to separate individual peptides, which were then 
spotted onto a MALDI plate. Individual spots on the MALDI plate were analysed 
using MS/MS on an Applied Biosystems machine. Peptides were identified using 
the manufacturer’s software and the NCBI database 

COS-7 cell overexpression screen. Expression constructs for the majority of the 
proteins identified from the column fractionation procedure were obtained from 
Open Biosystems or Origene. The identity of the cDNA was verified by sequencing 
before use. COS-7 cells were used as a ‘negative cell line’ in which to express 
candidates, as COS-7 CM induced little synaptic activity in RGCs (Supplemen- 
tary Fig. 4). COS-7 cells were transfected with cDNA using Lipofectamine 2000 
(Invitrogen) following the manufacturer’s instructions. Three hours after transfec- 
tion, cells were washed three times with warm DPBS and placed into minimal 
conditioning medium for 3 d, and conditioned media was collected as described 
for ACM. COS-7 CM was fed to RGCs in the presence of thrombospondin to 
induce structural synapse formation. 

Recombinant proteins and DNA constructs. Full-length cDNA for glypican 4 
(rat cDNA from Open Biosystems, clone no.7124728) or glypican 6 (mouse 
cDNA from Open Biosystems, clone no. 5008374) with a 6-histidine tag at the 
amino terminus was cloned into pAPtag5 vector (GenHunter) between the Sfil 
and Xhol sites. These were expressed in HEK293 cells or astrocytes, which were 
transfected using Lipofectamine 2000 (Invitrogen) following the manufacturer’s 
instructions. The secreted recombinant protein was then purified from condi- 
tioned culture media by Ni-chelating chromatography using Ni-NTA resin 
(Qiagen) following the manufacturer’s instructions. The purity of the protein 
was assessed by running a sample on an SDS gel and staining with Coomassie 
blue (Pierce). Protein concentration was assessed by Bradford assay and western 
blotting for glypican 4 (rabbit anti-glypican 4, Proteintech) or glypican 6 (goat 
anti-glypican 6, RandD) and the 6-histidine tag (mouse anti-histidine, Abcam), 
and glypican 4 and glypican 6 were fed to neurons at concentrations between 0.1 
and 10nM. Purified human platelet TSP1 was obtained from Haematologic 
Technologies and fed to neurons at 51g ml. For testing candidate glypican 4 
interactors, recombinant proteins were purchased and fed at the concentrations 
indicated in the figure legend. 

Glycosylation-deficient glypican 4 was generated by site-directed mutagenesis 
against the presumed glycosylation attachment sites, as in ref. 29. Mutagenesis was 
confirmed by DNA sequencing, and by the loss of high-molecular-weight smearing 
when the purified protein was run on a western blot. 

Synapse assay on RGCs. For synapse quantification of RGC cultures, cells were 
fixed for 7 min with 4% paraformaldehyde (PFA), washed three times in phosphate- 
buffered saline (PBS) and blocked in 200 pl of a blocking buffer containing 50% 
normal goat serum and 0.1% Triton X-100 for 30 min. After blocking, coverslips 
were washed three times in PBS, and 200 kl of primary antibody solution, consisting 
of mouse anti-bassoon (1:500, Stressgen) and postsynaptic antibody against all 
isoforms of homer (1:1,000, Chemicon), was added to each coverslip. Coverslips 
were incubated overnight at 4 °C, washed three times in PBS and incubated with 
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200 pl of Alexa-conjugated secondary antibodies (Invitrogen) diluted 1:1,000 in 
antibody buffer. Following incubation for 2 h, coverslips were washed three or four 
times in PBS and mounted in Vectashield mounting medium with DAPI (Vector 
Laboratories Inc.) on glass slides (VWR Scientific). Secondary-only controls were 
routinely performed and revealed no significant background staining. 

Mounted coverslips were imaged using a Nikon Eclipse E800 epifluorescence 

microscope (Nikon). Healthy cells that were at least two cell diameters from their 
nearest neighbours were identified and selected at random by eye using DAPI 
fluorescence. Eight-bit digital images of the fluorescence emission at both 594 and 
488 nm were recorded for each selected cell using a monochrome charge-coupled- 
device (CCD) camera and SPOT image capture software (Diagnostic Instruments, 
Inc.). Set exposures were used for each experiment, and were calculated from the 
positive control condition (RGC + astrocyte). Merged images were analysed for 
co-localized puncta by using a custom plug-in (written by Barry Wark and available 
on request from nallen@salk.edu) for the NIH image-processing package IMAGE]. 
In each experiment, 15-30 cells across three coverslips were imaged per condition. 
Graphs are averages of values obtained from multiple experiments, as detailed in the 
legend, and are normalized to the “RGC alone’ condition. 
GluA1 live surface staining. Rabbit anti-GluAl (Calbiochem), an antibody 
recognizing the extracellular region of GluA1, was added to the cell culture medium 
for 15 min at 37 °C at 5 pg ml to label surface AMPA receptors. Cell Tracker Red 
CMPTX dye (Invitrogen) was added at 0.5 uM during the incubation to label the 
whole cell. Cells were washed three times with DPBS to remove unbound antibody, 
fixed for 5 min in 4% PFA (the short fixation time prevents permeabilization of the 
cell), washed three times with PBS and blocked in 50% goat serum for 30 min at 
room temperature. After blocking, cells were washed three times with PBS and 
incubated in goat anti-rabbit Alexa 488 secondary antibody 1:500 for 1h at room 
temperature. Coverslips were washed three or four times in PBS and mounted in 
Vectashield mounting medium with DAPI (Vector Laboratories Inc.) on glass 
slides (VWR Scientific). Secondary-only controls were routinely performed and 
revealed no significant background staining. Rabbit polyclonal antibodies against 
intracellular proteins gave no staining using this method. 

Mounted coverslips were imaged using a Nikon Eclipse E800 epifluorescence 

microscope. Healthy cells that were at least two cell diameters from their nearest 
neighbours were identified at random using the Cell Tracker Red channel. Eight- 
bit digital images of the fluorescence emission at both 594 and 488nm were 
recorded for each selected cell using a monochrome CCD camera and SPOT image 
capture software (Diagnostic Instruments, Inc.). Set exposures were used for each 
experiment, and were calculated from the positive control condition (RGC + 
astrocyte). The numbers of puncta per cell were analysed using the Integrated 
Morphometry application in METAMORPH. In each experiment, 15-30 cells 
across three coverslips were imaged per condition. Graphs are averages of values 
obtained from multiple experiments, as detailed in the legend, and are normalized 
to the ‘RGC alone’ condition. 
Surface biotinylation and western blotting. For surface biotinylation of RGCs, 
cells were plated in 35-mm wells at 250,000 cells per well, and two or three wells 
were pooled per condition. Biotinylation was performed using a Pierce Cell 
Surface Protein Isolation Kit with the following modifications. All steps were 
performed at 4 °C unless otherwise stated. Cells were placed on ice and washed 
twice with DPBS, and then biotin at 0.25 mg ml’ in DPBS was added for 20 min at 
4°C to label surface proteins. Cells were washed three times with TBS to remove 
unbound biotin, and lysed in RIPA buffer. RIPA buffer was added to the well for 
5 min, and cells were removed by squirting, collected in an Eppendorf tube and 
rotated for 1h to solubilize membranes fully. Tubes were then spun at 
13,000 r.p.m. (16,000g) in an Eppendorf microcentrifuge for 15 min to pellet 
unsolubilized material, and the supernatant was collected. A sample was collected 
at this stage to analyse for total protein levels. The remainder of the solubilized 
proteins were mixed with streptavidin-conjugated beads overnight to isolate the 
biotinylated proteins. Unbound proteins were removed by centrifugation, and 
bound biotinylated proteins were collected by incubating the beads in SDS buffer 
plus DTT for 1h at room temperature. 

Both total RGC lysate proteins and surface RGC proteins were separated by size 
on 4-15% gradient SDS gels (Biorad) and transferred to PVDF membranes. 
Membranes were blocked in 5% milk for 1 h at room temperature, washed three 
times in PBS-Tween and incubated overnight at 4 °C in primary antibodies: mouse 
anti-B-actin 1:5,000 (Sigma), rabbit anti-NSE 1:1,000 (Polyscience), rabbit anti- 
GluA1 0.25 pg ml ~ t (Millipore), rabbit anti-GluA2 0.25 pg ml ! (Millipore), rabbit 
anti-GluA2/3 2.5 pg ml | (Millipore) and rabbit anti-GluA4 1:1,000 (Millipore). 
Horseradish-peroxidase-conjugated anti-mouse and anti-rabbit (1:5,000) were 
used as secondary antibodies (Millipore), and the detection was performed with 
an ECL-plus kit from Amersham. Signals were acquired using a QImager CCD 
system and analysed using the software provided by the manufacturer. Band 


intensity was normalized to the control (RGC alone) condition in each experiment, 
and data are presented normalized to RGC alone. 

Tissue western blotting. Hippocampi were dissected out in ice-cold PBS, and 
then each hippocampus was homogenized in 250 pl RIPA buffer containing 
protease and phosphatase inhibitor cocktails (Pierce). Lysates were solubilized 
for one hour at 4°C, spun down to remove unsolubilized material, aliquoted 
and snap-frozen in liquid nitrogen, and stored at —80 °C until use. Protein con- 
centrations were determined by BCA assay and were typically 5 jg pl '. For each 
sample, 5 jig of protein was loaded onto the gel. Blots were probed with antibodies 
as indicated in the figure legend, and values were normalized to the B-tubulin 
loading control for each sample. 

Electrophysiology in culture. Total synaptic activity and mEPSCs were recorded 
by whole-cell patch-clamping RGCs at room temperature at a holding potential of 
—70mV. The extracellular solution contained 140mM NaCl, 2.5mM CaCh, 
2mM MgCl, 2.5mM KCl, 10mM_ glucose, 1mM NaH,PO, and 10mM 
HEPES (pH 7.4), plus TTX (11M) when mEPSCs were being recorded. Patch 
pipettes had resistances of 3-5 MQ and the internal solution contained 120 mM 
potassium gluconate, 10 mM KCl, 10 mM EGTA and 10 mM HEPES (pH 7.2). We 
recorded mEPSCs using PCLAMP software for Windows (Axon Instruments), 
and analysed them using MINI ANALYSIS PROGRAM (SynaptoSoft). Results 
from at least four separate experiments were pooled for analysis. 
Electrophysiology in hippocampal slices. Experiments were carried out on 
littermate wild-type and Gpc4-knockout mice aged P12-P14 and P21-P25, and 
recordings and analysis were both carried out blind to genotype. A chilled choline- 
based cutting solution was used for dissection and slicing containing 78.3 mM 
NaCl, 23 mM NaHCOs, 23 mM glucose, 33.8 mM choline chloride, 2.3 mM KCl, 
1.1mM NaH,PO,, 6.4mM MgCl and 0.45mM CaCl, (pH 7.4). Parasagittal 
brain slices of thickness 250m containing the hippocampus were cut on a 
Leica vibratome. Slices were then kept at 31 °C for 25 min in the choline cutting 
solution and for 30min in isotonic saline solution (125mM NaCl, 25mM 
NaHCO, 25 mM glucose, 2.5mM KCl, 1.25mM NaH,PO,, 2mM MgCl, and 
2.5mM CaCl, (pH7.4)). Oxygenation (95% O2, 5% CO.) was continuously 
supplied during cutting, recovery and recording. Whole-cell voltage-clamp 
recordings of hippocampal CA1 pyramidal neurons were performed. Patch 
pipettes with resistances of 2-4. MQ were pulled from thick-walled borosilicate 
glass capillaries and filled with an internal solution containing 130 mM CsMeSO3 
(Cs* was used instead of K~ as the main cation to improve voltage uniformity), 
4mM NaCl, 10 mM HEPES, 5 mM EGTA, 0.5 mM CaCh, 4 mM MgATP, 0.5 mM 
NazGTP, 5 mM QX-314 (to suppress voltage-gated sodium currents), with its pH 
adjusted to 7.2 using CsOH. Access resistance was monitored throughout the 
recording and was <20 MQ. 

Recordings were carried out at room temperature in flowing isotonic saline 

containing 11M tetrodotoxin to block voltage-gated sodium channels, the 
GABA, receptor antagonist bicuculline (40 1M) and 25M D-AP5 to block 
NMDA receptors to isolate AMPA-mediated mEPSCs. In control experiments 
all events were eliminated by application of the AMPA receptor antagonist NBQX. 
We recorded mEPSCs for 5min and analysed them using MINI ANALYSIS 
PROGRAM. Cumulative probability plots for the interevent interval and the 
amplitude of mEPSCs were generated for each cell, and plots from wild-type 
and knockout cells were averaged to produce the plots shown in the figures. In 
addition, data were analysed by taking the first 100 events from each cell (these 
particular events were used to prevent bias from very active cells), pooling them for 
wild-type and knockout condition, and generating cumulative probability plots 
from the pooled data. The same results were obtained using both analysis methods. 
Analysis of dendrite length. In some recordings, Alexa 488 was included in the 
patch pipette solution. Slices were fixed for 1 h in 4% PFA on ice following record- 
ing, and were washed and mounted in Vectashield for imaging. In parallel experi- 
ments, vibratome sections were prepared from wild-type and Gpc4-knockout mice 
that had been crossed with Thy1-GFP-M mice to label sparse subsets of CA1 
pyramidal neurons with GFP, and were processed in the same way. Labelled cells 
were imaged ona Leica confocal microscope. Z-stacks were acquired to encompass 
all of the dendritic area, and the maximum projection of the dendritic area 
generated. Total dendrite length was analysed using HCA-vision NEURITE 
ANALYSIS software. 
Purification of cortical and hippocampal neurons and astrocytes and qRT- 
PCR. C57b16 mouse pups at P7 were used for purifications. Astrocytes were 
purified using integrin beta 5 immunopanning as described in ref. 5. Neurons 
were purified using L1 immunopanning as described in ref. 30. RNA was isolated 
using a Qiagen RNeasy kit following the manufacturer’s instructions. 

For expression analysis, template cDNA was prepared from 50-200 ng of total 
RNA by reverse transcription. Immunopanned cells from either cortex or 
hippocampus were paired, and purity was assessed by enrichment for astrocyte 
marker (GFAP) and neuron marker (Tuj1). Gene expression was quantified using 
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quantitative real-time PCR (qRT-PCR) in combination with gene-specific 
primers and the SYBR GREEN system (Roche). The reactions were performed 
on an Eppendorf Realplex4 cycler (Eppendorf). All samples were run in duplicates. 
For each DNA fragment (primer pair), all biological samples, including the input 
standard curves, were amplified on a common 96-well plate in the same run. The 
following primer pairs were used: AGGTCGGTGTGAACGGATTTG (Gapdh 
forward); TGTAGACCATGTAGTTGAGGTCA (Gapdh reverse); TAGACCC 
CAGCGGCAACTAT (Tuj-1 forward); TTCCAGGTTCCAAGTCCACC (Tuj-1 
reverse); GGGGCAAAAGCACCAAAGAAG (GFAP forward); GGGACAACTT 
GTATTGTGAGCC (GFAP reverse); CTGGAGGGTCCTTTCAACATT (Gpc4 
forward); GACATCAGTAACCAGTCGGTC (Gpc4 reverse); TAGTCCTGTAT 
TGGCAGCCAC (Gpcé6 forward); GGCTAATGTCTATAGCAGGGAA (Gpc6 
reverse). 

Primers efficiency was determined by generating standard curves. For con- 

firmation, quantitative PCR products were sequenced. Values were normalized 
relative to Gapdh expression. Relative gene expression between paired samples 
was estimated using the 2~44° method. 
In situ hybridization. Full-length mouse cDNA for both glypican 4 and glypican 
6 was used to generate probes approximately 2 kb in size (Open Biosystems: Gpc4 
clone no.3967797, Gpc6 clone no.5008374). Digoxigenin-labelled, single- 
stranded antisense and sense riboprobes were prepared by transcription of the 
linearized plasmid using either T7 or Sp6 RNA polymerases and a DIG RNA 
labelling kit (Roche) as per the manufacturer’s instructions. In situ hybridizations 
were performed essentially as described in ref. 31 with some modifications for 
postnatal tissue. Briefly, 10-j1m-thick, fresh-frozen sections of mouse brain were 
air-dried and fixed in 4% paraformaldehyde for 10 min. After washing with PBS 
three times for 10 min each time, sections were acetylated by incubation in 0.1M 
triethanolamine HCl and 0.25% acetic anhydride for 10 min. After washing with 
PBS three times for 5 min each time, hybridization was performed with probes at 
concentrations of 200-500 ng ml lina hybridization solution (50% formamide, 
x5 SSC, X5 Denhardt’s solution, 250 ig ml! baker’s yeast RNA and 100 pg ml! 
salmon sperm DNA) at 72°C for 16h. After hybridization, slides were washed 
once in X0.2 SSC at 72 °C for 60 min, once in X0.2 SSC at room temperature for 
5 min and once in buffer B1 (0.1. M Tris HCl (pH 7.5) and 150mM NaC\) for 
5 min, incubated in B1 with 10% normal goat serum for 1 h, and then incubated in 
B1 with 1% normal goat serum and 1:1000 anti-digoxygenin alkaline-phosphatase 
conjugated antibody (Roche) overnight at 4°C. Slides were then washed in B1 
three times for 5 min each time, equilibrated in buffer B3 (0.1 M Tris HCl (pH 9.5), 
100mM NaCl and 50mM MgCl,) four times for 15 min each time and then 
developed in B3 with 0.24 mg ml — "levamisole, 0.375 tl ml! NBT and3.5 pm! ' 
BCIP (Roche) until a colour precipitate was visible. The reaction was stopped by 
washing once in TBST and four times in water, and slides were mounted with 
coverslips using Glycergel mounting medium (Dako). 

For GFAP immunostaining, anti-mouse GFAP antibody (Sigma) at 1:1000 was 
added with anti-digoxygenin antibody overnight at 4 °C. After washing in B1 three 
times for 5min each time, anti-mouse Fc horseradish-peroxidase-conjugated 
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antibody was added in 1% goat serum in B1 (1:500) for 2h at room temperature 
before alkaline phosphatase was developed as above. Following washes in water, 
horseradish peroxidase was developed using DAB reagent (Vector Labs) for 8 min. 
Slides were washed six times in water and mounted with coverslips using Glycergel 
mounting medium. 

Array tomography. Array tomography was carried out as described in ref. 32. 
Hippocampal slices from P12 Gpc4-knockout and wild-type littermate controls 
were prepared in the same way as for electrophysiological analysis (300-um 
vibratome sections of live tissue), fixed with 4% PFA overnight at 4°C, and then 
dehydrated and embedded in LRWhite resin using the benchtop protocol. Ribbons 
of 30-35 serial ultrathin sections (70-100 nm) were cut on an ultramicrotome 
(Leica), mounted on subbed glass coverslips and immunostained using antibodies 
against VGLUT1 (guinea pig, Millipore), GluAl (rabbit, Abcam) and pan- 
MAGUK (mouse, NeuroMabs). Antibody binding was visualized using Alexa- 
488-, Alexa-594- and Alexa-647-labelled goat secondary antibodies (Invitrogen). 
Sections were mounted using SlowFade Gold antifade with DAPI (Invitrogen). 
Images were collected on a Zeiss Axiovert 200M fluorescence microscope with an 
AxioCam HRm CCD camera, using a Zeiss X63/1.4 NA Plan Apochromat 
objective. The hippocampal area CA1 stratum radiatum was imaged, to analyse 
synapses proximal to the pyramidal neuron cell body. 

Tissue volumes were reconstructed and aligned using IMAGE]J (NIH) and the 

multistackreg plug-in (Brad Busse), and then cropped to include CA1 stratum 
radiatum so that the same volume was analysed across animals. Image deconvolu- 
tion was carried out using a custom function in MATLAB (Gordon Wang). 
Individual puncta were identified for each channel, and then the number of 
MAGUK + GluA1 co-localized puncta associated with a VGLUT1 puncta was 
quantified using a separate custom function in MATLAB (Gordon Wang). Puncta 
numbers were normalized to wild-type values. 
Data analysis and presentation. All graphs represent average data with s.e.m. 
error bars. Statistical analysis was either by Student’s t-test when only two groups 
were being compared, or by one-way analysis of variance when three or more 
groups were compared. 
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PGC7 binds histone H3K9me?2 to protect against 
conversion of 5mC to 5hmC in early embryos 


Toshinobu Nakamura't, Yu-Jung Liv’, Hiroyuki Nakashima’, Hiroki Umehara’, Kimiko Inoue’, Shogo Matoba’, 
Makoto Tachibana*, Atsuo Ogura*, Yoichi Shinkai* & Toru Nakano? 


The modification of DNA by 5-methylcytosine (5mC) has essential 
roles in cell differentiation and development through epigenetic 
gene regulation’. 5mC can be converted to another modified 
base, 5-hydroxymethylcytosine (ShmC), by the tet methylcytosine 
dioxygenase (Tet) family of enzymes*’. Notably, the balance 
between 5hmC and 5mC in the genome is linked with cell- 
differentiation processes such as pluripotency and lineage commit- 
ment*’. We have previously reported that the maternal factor 
PGC7 (also known as Dppa3, Stella) is required for the mainten- 
ance of DNA methylation in early embryogenesis, and protects 
5mC from conversion to 5hmC in the maternal genome®*’. Here 
we show that PGC7 protects 5mC from Tet3-mediated conversion 
to 5hmC by binding to maternal chromatin containing dimethy- 
lated histone H3 lysine 9 (H3K9me2) in mice. In addition, 
imprinted loci that are marked with H3K9me2 in mature sperm 
are protected by PGC7 binding in early embryogenesis. This type 
of regulatory mechanism could be involved in DNA modifications 
in somatic cells as well as in early embryos. 

Maternal-genome chromatin bears considerable DNA methylation 
and contains H3K9me2 (ref. 10). Conversely, little DNA methylation 
remains and no H3K9me2 is present in the chromatin of the paternal 
genome. We found that DNA methylation of naked DNA and chro- 
matin DNA did not lead to substantial differences in the binding of 
PGC7 to DNA (Supplementary Figs 1, 2a and 3 and Supplementary 
Discussion). By contrast, compared with the nucleosomes purified from 
wild-type embryonic stem (ES) cells, PGC7 showed significantly weaker 
binding to the nucleosomes purified from ES cells lacking G9a (also 
known as Ehmt2)—an H3K9me2-specific lysine methyltransferase—in 
which H3K9me2 was absent, without affecting DNA methylation status 
(Fig. 1a, b and Supplementary Figs 2b and 4a, b)'’. However, PGC7 
bound to nucleosomes purified from rescued G9a knockout (G9a /-) 
ES cells, which expressed the short or long form of G9a (G9a_'~ -G9a(S) 
and G9a~'~-G9a(L), respectively), with comparable affinity to the 
nucleosomes of wild-type ES cells (Fig. 1a, b). Next, we conducted a 
chromatin-immunoprecipitation (ChIP) assay in PGC7-null and 
PGC7-expressing ES cells using an anti-PGC7 antibody. As shown in 
Fig. 1c, nucleosomes that were bound by PGC7 specifically contained 
H3K9me2. H3 with other methylation modifications, however, did not 
show binding to PGC7. We confirmed the H3K9me2-dependent 
PGC7 association with chromatin in wild-type, DNA methyltransfer- 
ase triple-knockout (Dnmtl1-/~ Dnmt3a_'~ Damt3b '~) mutants’? 
and G9a-null ES cells using a stepwise salt-extraction method”. 
PGC7 and H3 were similarly extracted from the nuclear pellets of wild- 
type, Dnmt1 '~ Dumt3a~'~ Dnumt3b~'~ and G9a (G9a_'~ -G9a(S) 
ES cells (Fig. 1d and Supplementary Fig. 5a, b). By contrast, PGC7 
was extracted from nuclear pellets of G9a ‘~ ES cells under a lower 
NaCl concentration compared with that in wild-type and 
Dnmt1~'~ Damt3a_‘~ Dnmt3b~'~ cells (Fig. 1d and Supplemen- 
tary Fig. 5a, b), showing that PGC7 binding to chromatin without 


H3K9me2 was weaker. These two experiments indicate that the asso- 
ciation between PGC7 and chromatin is dependent on the presence of 
H3K9me2. 

We studied the in vitro binding of PGC7 to various histone peptides 
to see whether PGC7 directly binds to H3K9me2. Although all histone 
peptides containing residues 1-21 bound to PGC7 to some extent, 
H3K9me2 bound to PGC7 most strongly (Fig. le). The same binding 
assay using truncated versions of PGC7, namely PGC7AC (containing 
amino acids 1-75) and PGC7AN (containing amino acids 76-150), 
showed that PGC7 bound to H3K9me2 by its amino terminus 
(Supplementary Fig. 6). Next, we carried out a competitive-binding 
assay to determine the binding strength of PGC7 and H3K9me2. As 
shown in Fig. 1f, only the excess amount of H3K9me2, but not that of 
the other peptides, outcompeted the binding of PGC7 to H3K9me2, 
indicating that the binding of PGC7 to H3K9me2 was stronger than 
that of other histone modifications. 

PGC7 binding was higher at two representative H3K9me2-enriched 
loci, Magea2 and Wfdc15a, but not at the control Pou5f1 locus (Fig. 1g). 
By contrast, such enrichment was not observed under the G9a-null 
condition (Fig. 1g). These data clearly demonstrate that PGC7 specif- 
ically binds to the loci marked with H3K9me2 in vivo. Meanwhile, 
micrococcal nuclease (MNase) activity was inhibited significantly by 
the enforced expression of PGC7 and PGC7ANES, a nuclear export 
signal deleted mutant of PGC7 (ref. 8), in wild-type and 
Dnmt1~'~ Dnmt3a~‘~ Dumt3b'~ ES cells (Supplementary Fig. 5c-f 
and Supplementary Discussion). However, neither PGC7 nor 
PGC7ANES reduced MNase sensitivity in G9a_‘~ ES cells, indicating 
that the protective function of PGC7 required H3K9me2 (Sup- 
plementary Fig. 5a, b, g and h). These results indicate that the protec- 
tive function of PGC7 is dependent on H3K9me2, but not DNA 
methylation. 

We next asked whether the H3K9me2-dependent binding of PGC7 
occurs under physiological conditions during early embryogenesis. 
Both the paternal and maternal pronuclei stained positively with 
anti-PGC7 antibody after conventional paraformaldehyde (PFA) 
fixation (PFA-Triton (PT) condition in Fig. 2a, b and Supplemen- 
tary Fig. 7). However, the staining pattern was completely different 
when zygotes were treated with Triton X-100 before PFA fixation’* 
(Triton-PFA (TP) condition in Fig. 2a, b and Supplementary Fig. 7). 
Under TP conditions, only the maternal pronucleus was labelled 
with the anti-PGC7 antibody, indicating that PGC7 in the paternal 
pronucleus was eluted by Triton X-100. In other words, PGC7 was 
tightly attached to a type of ‘architecture’ that was present in the 
maternal, but not the paternal, pronuclei. 

Considering the maternal pronucleus-specific localization of 
H3K9me2 (Fig. 2b) and the results of our experiments using ES cells 
(Fig. 1), we hypothesized that chromatin containing H3K9mez2 is the 
crucial structure to which PGC7 strongly binds in the maternal pro- 
nucleus. To test this hypothesis we expressed Jhdm2a (also known as 
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Figure 1 | Preferential binding of PGC7 to H3K9me2-marked chromatin. 
a, b, Electrophoretic gel-mobility shift assay of His-PGC7. Nucleosomes 
(approximately 10 1g) purified from wild-type (WT) and G9a ‘~ ES cells, and 
G9a ‘~ ES cells in which G9a_‘ -G9a(S) and G9a ‘~ -G9a(L) forms of G9a 
were expressed were incubated with increasing concentrations of His-PGC7 
(2.5, 5, 10 and 20 jig). The binding mixtures were analysed on agarose gels 
(a). Ratios of high-molecular-weight (MW) PGC7-nucleosome complex 

(>5 kb) were determined using NIH Image J (b). Error bars indicate s.d. 

(n = 3). Binding affinities were significantly different between wild-type and 
G9a_'~ ES cells (*P < 0.005, t-test). His-glutathione S-transferase (GST; 

20 1g) was used as a negative control in both experiments. Similar results were 
obtained in at least three independent experiments and representative results 
are shown. c, Histone-methylation status of PGC7-containing chromatin. 
Anti-PGC7 antibody was used to immunoprecipitate chromatin from 
PGC7 ‘~ ES cells stably expressing PGC7. Immunoprecipitates (IP) and 
aliquots of the input protein were analysed by immunoblotting with antibodies 
against various histone modifications. Essentially the same results were 
obtained in two independent experiments. d, Chromatin-binding status of 
PGC7 in ES cells. Nuclei were isolated from stable-PGC7-expressing wild-type, 
Dnmtl~/~ Dnmt3a_‘~ Dnmt3b'~ triple knockout (Dnmt TKO), G9a_'” or 
G9a~'~ -G9a(S) ES cells and were treated with DNase I under various 


Kdm3a), an H3K9 methylation/dimethylation-specific demethylase’*, 
in zygotes to erase H3K9 dimethylation. This involved injection of 
Jhdm2a polyadenylated messenger RNA and its inactive mutant, which 
contains a histidine to alanine point mutation (Jadm2a(H1122A)) in 
the JmjC domain and does not possess histone-demethylase function 
(Supplementary Figs 8, 9 and Supplementary Discussion). 

As expected, Jhdm2a expression abolished H3K9 methylation and 
dimethylation (Fig. 2c, d and Supplementary Fig. 10) but did not affect 
H3K9 trimethylation (Supplementary Fig. 11). Microinjection of 
Jhdm2a mRNA abolished PGC7 staining of the maternal pronuclei 
(Fig. 2e and Supplementary Fig. 12), resulting in the well-correlated 
PGC7 and H3K9me2 staining patterns under the TP condition (Sup- 
plementary Fig. 13). These data clearly indicate that the structure to 
which PGC7 strongly binds in the maternal pronucleus is H3K9me2- 
containing chromatin. Next, we examined whether DNA methylation 
was maintained after Jndm2a expression. As shown in Fig. 3f, g, DNA 
methylation was not retained after H3K9me2 demethylation by 
Jhdm2a. In addition, the H3K9me2 and methylated-cytosine staining 
intensities were well correlated (Supplementary Fig. 14), and DNA 
methylation in the maternal pronucleus was not maintained at the pro- 
nuclear (PN) 5 stage without PGC7, as reported previously*. Embryonic 
development was impaired by reducing methylated H3K9, and the 
accessibility of the antibodies to chromatin is shown in Supplemen- 
tary Figs 15 and 16 and Supplementary Discussion. Taken together, 
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concentrations of NaCl (100, 200, 300, 400 and 500 mM) to separate nuclear 
extract from nuclear debris. Equivalent amount of aliquots were analysed by 
immunoblotting with anti-PGC7 or anti-H3 antibodies. e, In vitro assay of the 
binding of recombinant PGC7 to various H3 tail peptides. Recombinant His- 
PGC7 and histone H3 tail peptides with various methylation modifications 
were used in an in vitro peptide-binding assay. Binding characteristics were 
analysed by immunoblotting with anti-PGC7 antibody. Similar results were 
obtained in three independent experiments. f, PGC7 competitive peptide- 
binding assay. N-terminal biotinylated H3K9me2 peptide was immobilized on 
streptavidin-sepharose beads and incubated with recombinant His-PGC7 in 
the presence of increasing amounts of unmodified H3 or the indicated 
methylated H3 peptides. Binding characteristics were analysed by 
immunoblotting with anti-PGC7 antibody. Similar results were obtained in two 
independent experiments. g, Concomitant binding of PGC7 to the chromatin 
loci marked with H3K9me2 in ES cells. A ChIP analysis using the indicated 
antibodies was conducted in the PGC7 '~ and G9a ‘~ ES cells with or without 
enforced PGC7 expression. The percentages of each PCR product in the 
immunoprecipitated sample per those of the input samples are shown (mean 
and s.d., n = 3). Anti-mouse or -rabbit IgG was used for the negative controls 
and the signals per total input of the negative controls were <0.12% in all genes 
examined (data not shown). 


these observations indicate that PGC7 protects methylated DNA by 
binding to chromatin regions containing H3K9me2. 

Next, we confirmed the importance of PGC7 binding to H3K9me2- 
containing chromatin in a different context. DNA methylation of 
paternally imprinted genes such as the differentially methylated regions 
(DMRs) of the Dik1-G#l2 domain (where G2 is also known as Meg3), 
H19 and Rasgrf1 is protected during the active demethylation in normal 
mouse development'’. However, DNA methylation of H19 and Rasgrfl 
DMRs was not maintained, whereas that of the Dlk1-Gtl2 DMR was 
maintained in PN5-stage embryos derived from PGC7-null oocytes. It is 
reasonable to assume that epigenetic modifications of H19 and Rasgrf1 
differ from those of Dik1-Gtl2, and that epigenetic modifications 
residing only in the former genes are responsible for the protective 
function of PGC7. Although the histones in nucleosomes are replaced 
by protamine during spermiogenesis, a recent epigenomic analysis of 
human and mouse sperm revealed that histone-containing nucleosomes 
are preferentially retained at loci of developmental importance, includ- 
ing imprinted gene clusters, homeobox-containing (Hox) gene clusters 
and Nanog homeobox (Nanog), but not in the intergenic regions of 
protein kinases, cyclic-AMP-dependent catalytic B (Prkacb), or zona 
pellucida glycoprotein 4 pseudogene (Zp4-ps)'”"*. Thus, we carried 
out a ChIP analysis of the above-mentioned paternally imprinted genes 
(Dlk1-Gtl2, H19 and Rasgrf1) and three maternally imprinted genes 
(paternally expressed genes 1, 3 and 5 (Peg! (Mest), Peg3, PegS (Nnat)), 
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Figure 2 | Protection of the maternal genome from DNA demethylation by 
PGC7 through H3K9me2-containing chromatin in early embryos. 

a, Schematic diagram of the two pre-treatment procedures: PT and TP 
conditions. b, After PT or TP treatment, immunostaining was performed with 
the indicated antibodies (m, maternal pronuclei; p, paternal pronuclei; pb, polar 
body). PGC7 and H3K9me2 are shown in red and green, respectively; nuclei 
were stained with DAPI (blue). A total of 38 and 42 zygotes were stained, under 
PT and TP conditions, respectively. c, d, H3K9 dimethylation after 
microinjecting Jidm2a or Jhdm2a mRNA coding for the H1122A mutation 
(Uhdm2a (H1122A)). Zygotes were injected with Jhdm2a or Jhdm2a(H1122A) 
mRNA and cultured for 4.5h in potassium-enriched simplex optimized 
medium (KSOM). H3K9 dimethylation was analysed using anti- H3K9me2 
antibody (H3K9me2, red; DAPI, blue) (c). A total of 32 and 34 zygotes were 
injected with Jndm2a and Jhdm2a(H1122A), respectively. A total of 55 non- 
injected zygotes were analysed as controls. The strength of the H3K9me2 
staining in maternal (M) and paternal (P) pronuclei was analysed 

(d). e, Analysis of PGC7 localization after microinjecting Judm2a or 
Jhdm2a(H1122A) mRNA. Zygotes were stained after TP fixation. PGC7 and 
histone H3 are shown in red and green, respectively; nuclei are stained with 
DAPI (blue). f, g, The methylation status of the parental genome after 
microinjecting Jadm2a or Jhdm2a(H1122A) mRNA. Zygotes were injected 
with Judm2a mRNA and cultured for 4.5 h in KSOM. Demethylation of the 
parental genome was analysed using anti-5mC antibody (5mC, green; DAPI, 
blue) (f). A total of 33 and 34 zygotes were injected with Jndm2a and 
Jhdm2a(H1122A) mRNA, respectively, and 41 non-injected zygotes were 
analysed as controls. The strength of 5mC staining in the maternal and paternal 
pronuclei was analysed (g). Scale bar, 20 jum. 


LETTER 


25. H3K4me2 


9 
s 
Oo 


10. H3K9me2 


20 


Percentage IP of input 

Percentage IP of input 

Percentage IP of input 
3 


@ Paternally imprinted gene ™ Maternally imprinted gene ™ Non-imprinted gene 


Figure 3 | Remaining H3K9me2 at the DMRs of two paternally imprinted 
genes, H19and Rasgrf1, in mature sperm. a—c, A ChIP—quantitative (q)PCR 
analysis of mature sperm was conducted using anti-H3 (a), -H3K9me2 (b) and 
-H3K4me2 (c) antibodies. H19, Dik1-Gtl2, Rasgrf1, Peg1, Peg3, Peg5, Nanog, 
Prkacb and Zp4-ps were analysed by qPCR. The mean and s.d. (n = 3) of the 
percentage of each PCR product in the immunoprecipitated sample compared 
with that in the input sample is shown. Anti-mouse or -rabbit IgG was used as a 
negative control. The signals per total input of the negative controls were 
<0.16% for all genes examined (data not shown). 


as well as the promoter and intergenic regions of three non-imprinted 
genes (Nanog, Prkacb and Zp4-ps) in mature mouse sperm. 

Histone enrichment was detected at all DMRs of the analysed 
imprinted genes and at the Nanog promoter region, but not at the 
Prkacb and Zp4-ps intergenic regions (Fig. 3a) as reported for human 
sperm. Notably, H3K9me2 was enriched at the H19 and Rasgrfl DMRs 
(Fig. 3b), but enrichment was not observed at the DMR of the paternally 
imprinted domain Dlk1-Gtl2. Considering that the Nanog-promoter 
region from the paternal genome was demethylated in wild-type 
zygotes, even in the presence of PGC7 (ref. 19), H3K9me2 should be 
crucial for the PGC7 protective function. These observations strongly 
suggest that the remaining H3K9me2 is important for protecting 
the H19 and Rasgrfl DMRs against active DNA demethylation after 
fertilization. It is reasonable to conclude that this protection is mediated 
by PGC7 binding at the loci containing H3K9me2. H3K4me2 was 
enriched at all maternally imprinted genes examined but not at the 
paternally imprinted genes (Fig. 3c and Supplementary Discussion. 
Taking the dynamics of H3K9me2 into account (see Supplementary 
Discussion), in the maternal genome and two paternally imprinted loci, 
the binding of PGC7 to H3K9me2-containing chromatin is critical for 
the protection of DNA methylation. 

We and another group found that the Tet3-mediated conversion of 
5mC to 5hmC takes place in the paternal genome but not in the maternal 
genome””””!, We also showed that this conversion occurred in the mater- 
nal genome of zygotes derived from PGC7-null oocytes. These observa- 
tions prompted us to analyse the effects of H3K9me2 on ShmC status as 
an analogy to the PGC7-dependent regulation of zygotic 5mC. As shown 
in Fig. 4a, eradicating H3K9me2 by expressing Jndm2a induced hydro- 
xymethylation of the maternal genome, and the levels of H3K9me2 
exhibited an inverse correlation with those of 5hmC (Supplementary 
Fig. 17). In addition, RanBP5-mER (RanBP5 fused to a mutated oestro- 
gen receptor) expression, which inhibits PGC7 function by driving the 
subcellular protein out of the nucleus* (Supplementary Fig. 18), also 
induced maternal-genome hydroxymethylation (Fig. 4a). Although it 
has been reported that acetylation of H3K9 decreases in Jhdm2a'~ 
testes”, this effect was negligible in our experimental system (Sup- 
plementary Fig. 19). Taken together, these results indicate that PGC7 
protected 5mC from the Tet3-mediated conversion to 5hmC through 
binding to H3K9me2-containing chromatin, which is essentially ident- 
ical to the model in which 5mC is protected from active demethylation. 

Tet3 was detectable in both paternal and maternal pronuclei under 
PT conditions (Fig. 4b, c). However, Tet3 was detected only in paternal 
pronuclei under TP conditions (Fig. 4b, c), indicating that Tet3 was 
tightly bound to a structure that existed in the paternal but not the 
maternal chromatin. Furthermore, Tet3 was detectable in the maternal 
pronucleus after microinjecting Jidm2a or RnaBP5-mER mRNA and 
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Figure 4 | Protection of the conversion of 5mC to 5hmC by PGC7 through 
H3K9me2-containing chromatin in early embryos. a, 5mC and 5hmC status 
after the microinjection of Jidm2a or RanBP5-mER mRNA. Zygotes were 
injected with Jhdm2a or RanBP5-mER mRNA and cultured for 4.5 h in KSOM. 
The 5mC and 5hmC states were analysed using anti-5mC and anti-5hmC 
antibodies (S5mC, green; 5hmC, red; DAPI, blue). Scale bar, 20 jum. b, ¢, Flag- 
tagged Tet3 mRNA was injected into zygotes obtained from wild-type or 
PGC7~'~ female mice with or without Jidm2a or RanBP5-mER mRNA and 
cultured for 4.5 h in KSOM. After PT or TP treatment as described in the Fig. 2 
legend, immunostaining was performed using an anti-Flag antibody. Tet3 is 
shown in green; nuclei were stained with DAPI (blue). Ten zygotes obtained from 
wild-type female mice injected with Flag-tagged Tet3 (Flag-Tet3) mRNA were 
stained with the anti-Flag antibody under PT conditions; 13 zygotes were injected 


was also detectable both in maternal and paternal pronuclei of PGC7- 
null fertilized eggs under TP conditions (Fig. 4b, c and Supplementary 
Fig. 20). These data clearly indicate that the tight binding of Tet3 to 
chromatin was inhibited by intranuclear PGC7 through its binding to 
H3K9me2-containing chromatin. Although we examined direct binding 
of PGC7 to Tet3, no obvious binding was observed (data not shown). 

To examine the effect of PGC7 on Tet3 in more detail, we carried out 
a stepwise salt-extraction analysis using the PGC7‘~ ES cells. As 
shown in Fig. 4d, PGC7 considerably reduced the binding of Tet3 to 
chromatin, which was as tight as that of H3 without PGC7. By contrast, 
PGC7AC did not show this effect despite having a chromatin-binding 
affinity similar to that of full-length PGC7. Moreover, the expression of 
PGC7AC did not inhibit the endonuclease activity in the MNase assay, 
like Tet3 binding (Supplementary Fig. 21), indicating that the inhibitory 
effect of PGC7 on Tet3 was not caused by competitive binding. PGC7 
inhibited MNase digestion of the linker part of chromatin by binding 
to H3K9me2, a modified histone tail, as described in the discussion of 
the MNase assay of G9a_‘~ ES cells (Supplementary Fig. 5). PGC7 
(~17kDa) is a relatively small protein compared to the histone 
octamer (~100 kDa) and the deleted carboxy-terminal part of PGC7 
(~8 kDa), which is essential for the inhibitory effect on both Tet3 
binding and MNase activity. Therefore, it is probably too small to have 
a marked steric effect. Meanwhile, it is conceivable that the distribution 
of H3K9mez2, through which PGC7 binds to chromatin, is not suffi- 
ciently dense. Considering these points, although we cannot exclude it 
completely, the possibility of a steric effect seems improbable. 
Therefore, we prefer the hypothesis that PGC7 inhibits the activity 
of the enzyme(s) acting on DNA, such as Tet3 and MNase, by means 
of a change in chromatin configuration. 

Although 5mC has been the only recognized DNA modification for 
many years in mammals, several reports have described a novel 5hmC 
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with Flag—Tet3 mRNA alone and 16 zygotes were injected with Tet3 and Jhdm2a 
mRNA or with Tet3 and RanBP5-mER mRNA. These embryos were stained with 
anti-Flag antibody under TP conditions. Zygotes obtained from PGC7 ‘~ female 
mice were injected with Flag~Tet3 mRNA. A total of 6 and 9 zygotes were stained 
with the anti-Flag antibody under PT and TP conditions, respectively (b) and the 
percentage of Flag staining in maternal (M) and paternal (P) pronuclei is shown 
(c). Scale bar, 20 um. d, Effect of PGC7 on the chromatin binding of Tet3 in ES 
cells. Nuclei were isolated from PGC7 ‘~ ES cells transfected with full-length 
PGC7, Flag-Tet3, PGC7 and Flag-Tet3, PGC7AC, and both PGC7AC and Flag- 
Tet3. These nuclei were treated with DNase I under various concentrations of 
NaCl (100, 200, 300, 400 and 500 mM) to separate nuclear extract from nuclear 
debris. Equivalent amount of aliquots were analysed by immunoblotting with 
anti-PGC7, anti-Flag and anti-H3 antibodies as described. 


DNA modification”’. The discovery of ShmC raises numerous questions, 
including its function, tissue localization and regulatory mechanisms of 
modification. The tight differential regulatory patterns of 5mC and 
5hmC in paternal and maternal pronuclei imply that 5hmC is import- 
ant’’°*", Here, we showed the molecular function of PGC7 during the 
modification as a first step towards understanding the regulatory 
mechanisms of 5-hydroxymethylation in DNA. It is noteworthy that 
regulation of reciprocal DNA methylation and hydroxymethylation 
was controlled by the same protein, PGC7, through binding to 
H3K9me2-containing chromatin. Two recent studies have raised the 
possibility that ShmC is an intermediate during DNA demethylation 
and that the Tet family are the critical enzymes for this process”*. 
Our conclusion is consistent with this notion, and the inhibition of 
Tet3 activity through the binding of PGC7 to H3K9me2 would explain 
the regulatory function of PGC7 in global DNA demethylation during 
early embryonic development (Supplementary Fig. 22). 


METHODS SUMMARY 


Details of cell culture, gel-shift assay, ChIP—western blotting, sperm collection, 
chromatin preparation, Plasmids, global DNA methylation analysis, MNase assay, 
histone peptide-binding assay, stepwise salt extraction, zygote collection and 
culture, triton treatment of zygotes, immunohistochemistry, ChIP—quantitative 
PCR and sperm chromatin preparation can be found in Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cell culture. G9a~/~, Dnmt1~/~ Dnmt3a~'~ Dnmt3b~'~ and PGC7~'~ ES cells 
and their clones were used. PGC7 ‘~ ES cells were derived from blastocysts. These 
cells were maintained as described previously’. 

Gel-shift assay. Purified mononucleosomes were incubated with various amounts 
of purified histidine-tagged PGC7 (His-PGC7) on ice for 30 min in 50 mM NaCl, 
20 mM Tris buffer-HCl (pH 7.5), 2 mM EDTA buffer, 5 mg ml ! BSA buffer and 
5% glycerol. The samples were electrophoresed on a 1% native agarose gel and the 
chromatin was visualized using ethidium bromide. 

ChIP-western blotting. Mononucleosomes prepared from PGC7 ‘~ ES cells and 
a cell line stably expressing PGC7 were incubated with anti-PGC7 antibody for 3h 
at 4 °C. After adding protein G-Sepharose 6 Fast Flow (GE Healthcare), the samples 
were incubated for another hour at 4°C, washed and eluted from the beads using 
SDS sample buffer. The immunoprecipitates were analysed by immunoblotting as 
described previously’. 

Sperm collection. Sperm was obtained from ICR mice aged 10-20 weeks. The 
cauda epididymides were partially cut open in human tubal fluid (Millipore) and 
incubated for 1 h at 37 °C in 5% CO, to allow the sperm to swim out. The sperm 
pellet was washed with PBS buffer and ChIP-quantitative (q)PCR analysis was 
performed as described in the Supplementary Information. 

Chromatin preparation. Cell pellets were re-suspended in buffer I (300 mM 
sucrose, 60 mM KCl, 15mM NaCl, 5mM MgCl, 0.1mM EGTA buffer, 15 mM 
Tris-HCl (pH 7.5), 0.4% tergitol-type NP-40 (NP-40) and 0.5 mM DTT), and an 
equal volume of buffer II (300mM sucrose, 60mM KCl, 15mM NaCl, 5mM 
MgCl, 0.1mM EGTA, 15mM Tris-HCl (pH 7.5) and 0.5mM dithiothreitol 
(DTT)) was added. After incubation on ice for 10 min, the cell suspensions were 
layered over buffer III (1.2 M sucrose, 60 mM KCl, 15mM NaCl, 5mM MgCh, 
0.1mM EGTA, 15mM Tris-HCl (pH 7.5) and 0.5mM DTT) and the nuclear 
pellets were collected by centrifugation (10,000g, 4 °C, 20 min). To avoid carry over 
of NP-40, the supernatant was carefully removed at least three times using a new 
Pasteur pipette at each time. Nuclear pellets were washed in MNase digestion 
buffer (320 mM sucrose, 50mM Tris-HCl (pH 7.5), 4mM MgCl, and 1mM 
CaCl.) and then re-suspended in MNase digestion buffer. Chromatin was released 
from the nuclear preparations by digestion with 20 U ml ' MNase (Takara) at 
37°C for 9 min. Digestion was stopped by adding 0.2 mM EDTA to a final con- 
centration of 5mM on ice. Chromatin preparations were analysed by native 
agarose gel electrophoresis (Supplementary Fig. 1) and the mononucleosome 
was used immediately for the gel-shift assay. 

Plasmids. The Jidm2a complementary DNA was cloned into pcDNA4mycHisA 
(Invitrogen). The Jhdm2a(H1122A) mutant was generated by PCR-based muta- 
genesis and confirmed by sequencing. The primers used to generate the mutant are 
described in Supplementary Table 2. 

Global DNA methylation analysis. A previous protocol for global DNA methy- 
lation analysis” was used with slight modifications. Genomic DNA was isolated 
from various ES cells with proteinase K and RNase A, followed by phenol/chlo- 
roform extraction and ethanol precipitation. A 2-11g aliquot of genomic DNA was 
digested with 50 U of methylation-sensitive Hpall or the methylation-insensitive 
isoschisomer MspI for 16-18h at 37°C. The digested genomes were purified by 
phenol/chloroform extraction and ethanol precipitation, and 250 ng of purified 
DNA was labelled with *H-deoxycytidine triphosphate (ACTP) at 56°C for 1h 
using a single-nucleotide extension reaction. Undigested genomic DNA served as 
a background control. The samples were applied to DE-81 ion-exchange filters and 
washed three times with 0.5 M Na3PO, buffer (pH 7.0) at room temperature. The 
filters were then dried and processed for scintillation counting. 

MNase assay. Living cells were permeabilized on ice with 0.02% L-o-lysolecithin 
(Sigma) in 150mM sucrose, 35 mM HEPES-NaOH (pH 7.4), 5mM KHPOg, 
5mM MgCl and0.5 mM CaCl, for 90s, followed by digestion with 3 U ml * 
MNase (Takara) in 150 mM sucrose, 50 mM Tris-HCl (pH 7.5), 50 mM NaCl and 
2mM CaCl, at room temperature for 0, 2, 4, 6, 8, 10, 30 and 60 min. Digestion was 
stopped by adding EDTA to a final concentration 5 mM, on ice. DNA was purified 
by phenol/chloroform extraction and electrophoresed on 0.8% native agarose gels. 
Histone peptide-binding assay. Biotinylated histone peptides were purchased 
from Upstate Biotechnology. In brief, biotinylated histone peptides (0.5 j1g) were 
incubated with streptavidin beads (GE Healthcare) in binding buffer (50mM 
Tris-HCl (pH 8.0), 300 mM NaCl and 0.1% NP-40) for 1h at 4°C with rotation. 
After extensive washing, the beads were incubated with 1 ig recombinant-purified 
His-PGC7 for 2h at 4°C with rotation. 10- and 100-fold unbiotinylated histone 
peptides were added for the competitive-binding assay. After extensive washing, 
bound PGC7 protein was analysed by SDS-PAGE and immunoblotting with anti- 
PGC7 antibody. 


Stepwise salt extraction. ES cells were suspended in nuclear isolation buffer 
(10mM Tris-HCl (pH 7.5), 60mM KCl, 15mM NaCl, 1mM DTT, 1.5mM 
MgCl, 1mM CaCl,, 250mM sucrose, 10% glycerol, 1mM DTT and 0.15% 
NP-40) on ice for 10 min. Nuclear pellets were treated with 100 U ml! DNase I 
(Takara) in nuclear isolation buffer with increasing amounts of NaCl (100, 200, 
300, 400 and 500mM) at 25°C for 20min and on ice for 10min. After the 
incubation, EDTA was added to a final concentration of 5 mM, and the sample 
was incubated on ice for 10 min. Nuclear extracts were separated from pellets by 
centrifugation. 

Zygote collection and culture. Female B6D2F1 mice >8 weeks old were super- 
ovulated by injecting 5 U of human chorionic gonadotropin 48 h after injecting 
5U of pregnant mare serum gonadotropin, and then mated with male mice. 
Fertilized eggs were collected from the oviduct, placed in 100-11 drops of 
KSOM (Millipore) and cultured at 37 °C in an atmosphere of 5% CO). 

Triton treatment of zygotes. Zygotes were treated with Triton X-100, similar to a 
previous report with minor modifications. Zygotes were treated with 0.2% Triton 
X-100 in PBS for 45-60 s until the perivitelline space was eliminated. Immediately 
after Triton treatment, the zygotes were washed with PBS at least five times and 
then fixed in 4% PFA. After washing with PBS, immunostaining was performed as 
described below. 

Immunohistochemistry. Fertilized eggs were washed with PBS, fixed for 15 min 
in 4% PFA in PBS at room temperature and permeabilized with 0.2% Triton X-100 
in PBS for 20min at room temperature. The eggs were blocked for 1h in 5% 
normal goat serum in PBS at room temperature and incubated overnight at 
4°C with primary antibodies as shown in Supplementary Table 1. The following 
day, the eggs were washed three times with 0.05% Tween20 in PBS and staining 
was detected by incubating the eggs with secondary antibodies as shown in 
Supplementary Table 1. Nuclei were stained with 1 pg ml~' DAPI. 5mC, ShmC 
and DNA staining was performed as described previously*. Immunofluorescence 
was visualized using an LSM510 confocal laser scanning microscope (Carl Zeiss). 
ChIP-qPCR. PGC7 ‘, PGC7 '-PGC7, G9a_'~ and G9a_'~-PGC7 ES cells 
(3 X 10”) were treated with 1% PFA for 8 min at room temperature. After quench- 
ing the PFA crosslinking reaction with 200 mM glycine, the fixed cells were washed 
with PBS. The cells were suspended in radio immunoprecipitation assay (RIPA) 
buffer (50 mM Tris-HCl (pH 8) 150mM NaCl, 1mM EDTA, 1% NP-40, 0.5% 
deoxycholate and 0.1% SDS) and sonicated to an average fragment size of 200- 
1000 bp. Solubilized chromatin was clarified by centrifugation for 10 min at 15,000 
r.p.m. and 4 °C. The supernatant was pre-cleared with protein G-Sepharose beads, 
which were pre-blocked with salmon sperm DNA and BSA, at 4°C for 1h. The 
pre-cleared chromatin was incubated with anti-PGC7, anti-H3 and anti- 
H3K9mez2 antibodies for 14-18h at 4°C. Immune complexes were bound to 
pre-blocked protein G-Sepharose beads for 2h at 4°C. The beads were washed 
with RIPA buffer, high-salt wash buffer (20 mM Tris-HCl (pH 8), 500 mM NaCl, 
1mM EDTA, 1% NP-40, 0.5% deoxycholate and 0.1% SDS), LiCl wash buffer 
(250mM LiCl, 20mM Tris-HCl (pH 8), 1mM EDTA, 1% NP-40 and 0.5% 
deoxycholic acid (DOC)), and once with Tris-EDTA. Immune complexes bound 
to protein G beads were suspended in elution buffer (20mM Tris-HCl (pH 8), 
300 mM NaCl, 1 mM EDTA and 0.5% SDS) and incubated for 6h at 65 °C. After 
incubation, the samples were treated with 30 pg ml! RNase A for 1 hat 37 °C, and 
100 1g ml ' proteinase K for 8h at 56 °C. DNA was extracted with phenol/chlo- 
roform and precipitated with ethanol plus Dr.GenTLE (Takara) as a carrier. 
Precipitated DNA was re-suspended in 40 pil of water and analysed by qPCR using 
the specific primers shown in Supplementary Table 2. 

Sperm chromatin preparation. Sperm pellets were washed with PBS, suspended 
in lysis buffer (0.1% SDS, 0.5% Triton X-100 in PBS) and incubated on ice for 
20 min. After centrifugation, the sperm pellets were washed with PBS, suspended 
in 0.05% L-o.-lysolecithin (Sigma) in PBS and incubated on ice for 15 min. The 
sperm pellets were collected by centrifugation, suspended in 10 mM DTT in PBS 
and incubated on ice for 10 min. After centrifugation, sperm pellets were fixed with 
1% PFA in PBS for 8 min at room temperature. Next, 2.5 M glycine was added to a 
final concentration of 0.2.M, and incubation was continued for an additional 
10 min at room temperature. After washing with PBS, the pellet was suspended 
in RIPA buffer (20 mM HEPES-NaOH (pH 7.5), 150 mM NaCl, 1 mM EDTA, 1% 
NP-40, 0.5% DOC and 0.1% SDS) and disrupted by sonication using a Bioruptor 
(Diagenode). ChIP-qPCR was conducted as described above using the specific 
primers shown in Supplementary Table 2. 


25. Pogribny, |. Yi, P. & James, S. A sensitive new method for rapid detection of 
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Highly pathogenic avian H5N1 influenza A viruses occasionally 
infect humans, but currently do not transmit efficiently among 
humans. The viral haemagglutinin (HA) protein is a known 
host-range determinant as it mediates virus binding to host- 
specific cellular receptors’ °. Here we assess the molecular changes 
in HA that would allow a virus possessing subtype H5 HA to be 
transmissible among mammals. We identified a reassortant H5 
HA/HINI1 virus—comprising H5 HA (from an H5NI1 virus) with 
four mutations and the remaining seven gene segments from a 
2009 pandemic H1N1 virus—that was capable of droplet transmis- 
sion in a ferret model. The transmissible H5 reassortant virus 
preferentially recognized human-type receptors, replicated effi- 
ciently in ferrets, caused lung lesions and weight loss, but was 
not highly pathogenic and did not cause mortality. These results 
indicate that H5 HA can convert to an HA that supports efficient 
viral transmission in mammals; however, we do not know whether 
the four mutations in the H5 HA identified here would render a 
wholly avian H5N1 virus transmissible. The genetic origin of the 
remaining seven viral gene segments may also critically contribute 
to transmissibility in mammals. Nevertheless, as H5N1 viruses 
continue to evolve and infect humans, receptor-binding variants 
of H5N1 viruses with pandemic potential, including avian-human 
reassortant viruses as tested here, may emerge. Our findings 
emphasize the need to prepare for potential pandemics caused by 
influenza viruses possessing H5 HA, and will help individuals con- 
ducting surveillance in regions with circulating H5N1 viruses to 
recognize key residues that predict the pandemic potential of iso- 
lates, which will inform the development, production and distri- 
bution of effective countermeasures. 

Although H5N1 viruses continue to cause outbreaks in poultry and 
there are cases of human infection in Indonesia, Vietnam, Egypt and 
elsewhere (http://www.who.int/influenza/human_animal_interface/H5N1_ 
cumulative_table_archives/en/index.html), they have not acquired the 
ability to cause human-to-human transmission. Investment in H5N1 
vaccines has therefore been questioned. However, because humans 
lack immunity to influenza viruses possessing an H5 HA, the emergence 
of a transmissible H5-HA-possessing virus would probably cause a 
pandemic. To prepare better for such a scenario, it is critical that we 
understand the molecular changes that may render H5-HA-possessing 
viruses transmissible in mammals. Such knowledge would allow us to 
monitor circulating or newly emerging variants for their pandemic 
potential, focus eradication efforts on viruses that already have 
acquired subsets of molecular changes critical for transmission in 
mammals, stockpile antiviral compounds in regions where such viruses 
circulate, and initiate vaccine generation and large-scale production 


1,2,3,5 


before a pandemic. Therefore, we studied the molecular features that 
would render H5-HA-possessing viruses transmissible in mammals. 

Previous studies suggested that HA has a major role in host-range 
restriction of influenza A viruses’ °. The HA of human isolates preferen- 
tially recognizes sialic acid linked to galactose by «2,6-linkages 
(Siax2,6Gal), whereas the HA of avian isolates preferentially recognizes 
sialic acid linked to galactose by «2,3-linkages (Siax2,3Gal)’. A small 
number of avian H5N1 viruses isolated from humans show limited 
binding to human-type receptors, a property conferred by several amino 
acid changes in HA*®. None of the H5N1 viruses tested transmitted 
efficiently in a ferret model'®, although, while our paper was under 
review, one study* reported that a virus with a mutant H5 HA and a 
neuraminidase (NA) of a human virus in the H5N1 virus background 
caused respiratory droplet transmission in one of two contact ferrets. 

To identify novel mutations in avian H5 HAs that confer human- 
type receptor-binding preference, we introduced random mutations 
into the globular head (amino acids 120-259 (H3 numbering), which 
includes the receptor-binding pocket) of A/Vietnam/1203/2004 
(H5N1; VN1203) HA (Supplementary Fig. 1). Although this virus 
was isolated from a human, its HA retains avian-type receptor-binding 
properties®’’. We also replaced the multibasic HA cleavage sequence 
with a non-virulent-type cleavage sequence, allowing us to per- 
form studies in biosafety level 2 containment (http://www.who.int/ 
csr/resources/publications/influenza/influenzaRMD2003_5.pdf). The 
mutated polymerase chain reaction (PCR) products were cloned into 
RNA polymerase I plasmids'® containing the VN1203 HA comple- 
mentary DNA, which resulted in Escherichia coli libraries representing 
the randomly generated HA variants. Sequence analysis of 48 
randomly selected clones indicated an average of 1.0 amino acid 
changes per HA globular head (data not shown). To generate an 
H5NI1 virus library, plasmids for the synthesis of the mutated HA gene 
and the unmodified NA gene of VN1203 were transfected into human 
embryonic kidney (293T) cells together with plasmids for the synthesis 
of the six remaining viral genes of A/Puerto Rico/8/34 (H1N1; PR8), a 
laboratory-adapted human influenza A virus. 

Turkey red blood cells (TRBCs; which possess both Siax2,6Gal and 
Sia#2,3Gal on their surface (data not shown)) were treated with 
Salmonella enterica serovar Typhimurium LT2 sialidase, which pref- 
erentially removes «2,3-linked sialic acid (that is, avian-type receptors), 
creating TRBCs that predominantly possess Sia%2,6Gal on the cell 
surface (Siax2,6-TRBCs; Supplementary Fig. 2). The virus library 
was then adsorbed to Siau2,6-TRBCs at 4°C and extensively washed 
to remove nonspecifically or weakly bound viruses. Bound viruses were 
eluted by incubation at 37 °C for 30 min, and then diluted to approxi- 
mately ~0.5 viruses per well (on the basis of a pilot experiment that 
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assessed the approximate number of eluted viruses). We screened one- 
third of the library (that is, 2.1 x 10° viruses) in three separate selection 
experiments (that is, 0.7 X 10° viruses per experiment) and isolated 
370 viruses that bound to Siax2,6-TRBCs (Supplementary Fig. 2). 
Individual viruses were then grown in Madin-Darby canine kidney 
(MDCK) cells modified to overexpress Siax2,6Gal (AX4 cells’’), and 
screened again for their ability to agglutinate Siax2,6-TRBCs (Sup- 
plementary Fig. 2). The parental control virus (designated VN1203/ 
PR8) with avian-type receptor-binding specificity agglutinated untreated 
TRBCs (which possess both human- and avian-type receptors on their 
surface), but not TRBCs possessing predominantly human-type recep- 
tors (Siax2,6-TRBCs; Supplementary Table 1). By contrast, of the 370 
viruses originally isolated, nine agglutinated Siax2,6-TRBCs, albeit 
with different efficiencies (Supplementary Table 1). All nine viruses 
possessed mutations in the region targeted for random mutagenesis; 
one mutant also possessed an additional mutation (E119G) in an area 
that was not targeted for mutation. Most of the mutations clustered 
around the receptor-binding pocket (Fig. 1a). Several of the selected 
viruses possessed mutations known to increase binding to human- 
type receptors, including N186K (ref. 9), S227N (ref. 5) and Q226L 
(which confers human-type receptor binding together with G228S)’° 
(all shown in blue in Fig. 1a). The identification of known deter- 
minants of human-type receptor-binding specificity from a library of 
random mutants validates our approach. Notably, our screen also 
identified mutations not previously associated with receptor-binding 
specificity. 

Although viruses were diluted to ~0.5 viruses per well for amplifica- 
tion in AX4 cells, we cannot exclude the possibility that some wells were 
infected with more than one virus, resulting in mixed populations. 
To confirm the significance of the identified mutations in HA for 
human-type receptor binding, the mutations were engineered into a 
VN1203/PR8 virus (possessing an avirulent HA cleavage site sequence, 
as described earlier). All nine mutants were generated; however, after 
two passages in MDCK cells, the $136N mutation reverted to the wild- 
type sequence. This mutant was excluded from further evaluation. 

First, we confirmed the binding of the remaining eight variants to 
Siau2,6-TRBCs (Supplementary Table 1). For comparison, we 
included a VN1203/PR8 virus with two changes in its HA (Q226L 
and G228S) previously shown to have increased binding to 
Siax2,6Gal®’*. Indeed, compared to the wild-type VN1203/PR8 virus, 
the Q226L/G228S mutant displayed an increased ability to bind to 
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Figure 1 | Localization of amino acid changes identified in this study on the 
three-dimensional structure of the monomer of VN1203 HA (Protein Data 
Bank accession 2FK0)"°. a, Close-up view of the globular head of VN1203 HA. 
Mutations known to increase affinity to human-type receptors are shown in 
blue. Amino acid changes not previously known to affect receptor binding are 
shown in green. Additional mutations that occurred in the HA of H5 avian— 
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human-type receptors. For the recreated variants, haemagglutination 
titres were higher and slightly different from the initial characteriza- 
tion, which we attribute to biological differences (the initial character- 
ization was carried out with non-concentrated cell culture supernatant 
and potentially mixed virus populations, whereas the recreated viruses 
were concentrated and purified) and to experimental differences (that 
is, differences between the TRBC batches or the efficiency of «2,3- 
sialidase treatment, or both). Collectively, however, these experiments 
demonstrate that this random mutagenesis approach allows the 
identification of hitherto unrecognized amino acid substitutions that 
permit avian virus HAs to bind to human-type receptors. 

To characterize further the receptor-binding properties of the 
selected variants, we used solid-phase binding assays in which 
sialylglycopolymers were absorbed to plates, which were then 
incubated with virus (Fig. 2a). A virus possessing the HA and NA 
genes of the seasonal human A/Kawasaki/173/2001 (H1N1; K173) 
virus and the remaining genes from PR8 (K173/PR8) served as a 
control virus with typical human-type receptor specificity. Indeed, 
K173/PR8 preferentially bound to Siax2,6Gal. In contrast, VN1203/ 
PR8 bound to only Siax2,3Gal. As reported elsewhere®"®, the Q226L/ 
G228S mutations led to increased binding to Siax2,6Gal. Variants 
1202T/R2208, W153R/T1601, N1691/H184L/I1217M and H130Q/ 
K157E resembled VN1203/PR8 in their binding to glycans, despite 
the fact that these mutants weakly agglutinated Siax2,6-TRBCs (see 
Supplementary Table 1). These viruses may have bound to glycans on 
TRBCs that were different from Siax2,6Galf1,4GIcNAc used in this 
study. However, variants N186K/M230I, $227N/G228A and Q226L/ 
E231G showed an appreciable increase in binding to Siax2,6Gal but 
also retained binding capacity for Siaw2,3Gal. Of all of the variants 
tested, only E119G/V152I/N224K/Q226L exhibited specificity for only 
Siax2,6Gal. Thus, only one H5 HA variant with receptor-binding 
capability akin to that of seasonal influenza viruses was isolated from 
the library screen of 2.1 X 10° viruses. To identify the amino acid 
change(s) responsible for the conversion from Siax2,3Gal to 
Siav2,6Gal recognition in the E119G/V152I/N224K/Q226L virus 
HA, we tested the amino acid changes at positions 119, 152, 224 and 
226 individually and in various combinations. Solid-phase binding 
assays demonstrated that the N224K/Q226L combination is critical 
for the shift from Siax2,3Gal to Siax2,6Gal recognition (Fig. 2b); 
Q226L in combination with V152I also conferred weak binding to 
02,6-glycans. 
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human reassortant viruses during replication and/or transmission in ferrets are 
shown in red. b, The positions of four mutations in the HA of H5 transmissible 
reassortant mutant virus, HA(N158D/N224K/Q226L/T3181)/CA04, are 
highlighted in red. The fusion peptide of HA is shown in cyan. All mutations are 
shown with H3 numbering. Images were created with MacPymol (http:// 
www.pymol.org/). 
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Figure 2 | Characterization of the receptor-binding properties of isolated 
viruses. a, Binding of VN1203 mutants to sialylglycopolymers in solid-phase 
binding assays. A human virus (K173/PR8), an avian virus (VN1203/PR8) and 
mutant VN1203/PR8 viruses were compared for their ability to bind to 
sialylglycopolymers containing either «2,3-linked (blue) or «2,6-linked (red) 
sialic acids. b, Identification of mutations that confer binding to human-type 
receptors. c, Binding of VN1203 mutant viruses to human respiratory tissues. 
K173/PR8, VN1203/PR8 and mutant VN1203/PR8 viruses were incubated 


To assess the effect of enhanced «2,6-glycan recognition on the 
attachment of viruses to human respiratory tracts, sections of tracheal 
and lung tissues were exposed to K173/PR8 (human-type receptor 
binder), VN1203/PR8 (avian-type receptor binder) and mutant 
VN1203/PR8 viruses (Fig. 2c). Because the N186K/M230I, $227N/ 
G228A, Q226L/E231G, E119G/V1521/N224K/Q226L and N224K/ 
Q226L mutants exhibited appreciable binding to Siax2,6Gal (Fig. 2a, b), 
the attachment of these mutants was also tested. On tracheal sections, 
the K173/PR8 virus bound extensively to ciliated epithelial cells (Fig. 2c 
and Supplementary Fig. 3), whereas the VN1203/PR8 virus bound 
poorly. By contrast, on lung sections, both viruses bound extensively 
to the alveolar epithelial surface (both type I and II pneumocytes; Fig. 2c 
and Supplementary Fig. 4). The binding patterns of these viruses cor- 
relate with the distribution of Siax2,3Gal (that is, avian-type receptors; 
present in lung epithelia) and Sia~2,6Gal (that is, human-type recep- 
tors; present in both trachea and lung epithelia) on the tissues, as 
observed with lectin staining’* (Supplementary Fig. 5). Like the human 
K173/PR8 virus, the E119G/V1521/N224K/Q226L and N224K/Q226L 
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with human tissue sections and then stained with either anti-K173 antiserum 
(green) or anti- VN1203 HA antibodies (green). All sections were subsequently 
incubated with labelled secondary antibodies and Hoechst dye (blue). 

d, Characterization of the receptor-binding properties of N158D/N224K/ 
Q226L, N158D/N224K/Q226L/T318I and T318I viruses. The direct binding of 
virus to sialylglycopolymers containing either «2,3-linked (blue) or «2,6-linked 
(red) sialic acids was determined as described in panel a. 


mutants exhibited strong binding to the ciliated epithelial cells of the 
trachea (Fig. 2c and Supplementary Fig. 3). By contrast, the N186K/ 
M2301, $227N/G228A and Q226L/E231G mutants displayed little-to- 
no binding to tracheal epithelia (Fig. 2c), despite their binding to 
Siax2,6Gal (Fig. 2a). A number of sialylated oligosaccharides with 
differing branching patterns and chain lengths are thought to be 
present on the cell surface’®. We therefore speculate that the mutants 
can recognize a short glycan structure such as Siav2,6GalB1,4GIcNAc, 
but may not recognize longer, more complex glycan structures, which 
are possibly required for binding to human tracheal epithelium. On the 
other hand, all mutants bound to alveolar epithelial cells (both type I 
and II pneumocytes; Fig. 2c and Supplementary Fig. 4). When the 
tissue sections were pre-treated with Arthrobacter ureafaciens sialidase 
(which cleaves all non-reducing terminally branched and unbranched 
sialic acids), virus binding to the tissues was substantially reduced 
(Supplementary Fig. 6a—c), confirming the sialic acid binding specifi- 
city of the virus. These data indicate that alterations in the receptor 
specificity of the E119G/V152I/N224K/Q226L and N224K/Q226L 
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mutants have profound effects on virus attachment to human respir- 
atory epithelium. 

In an avian H3 HA, the Q226L mutation changed the binding 
preference from avian- to human-type”’. A previous study found that 
the Q226L mutation on an H5 HA does not confer efficient binding to 
02,6-glycans in a glycan array" ; however, when tested in combination 
with G228S, increased binding to human-type receptors, but not a 
complete switch from avian- to human-type receptor-binding specifi- 
city, was observed’’. By contrast, here we found that Q226L in com- 
bination with N224K resulted in a switch from Siax2,3Gal to 
Sia%2,6Gal binding in an H5 HA and allowed virus binding to human 
tracheal epithelia (Fig. 2c). The receptor-binding domain of HA is 
formed by the 190-helix at the top of HA, the 220-loop at the edge 
of the globular head, and the 130-loop at the other edge of the globular 
head (Fig. 1a). Crystal structure analysis revealed that the 220-loop of 
avian H5 HA is closer to the opposing 130-loop than in human H3 HA, 
indicating that a wider binding site for human H3 HA, compared to 
that of avian H5 HA, may be required to optimize contacts with the 
larger Siac2,6-glycans”’. N224 lies on the turn leading into the 220- 
loop, adjacent to position 226 (Fig. 1a). Replacement of N224 may alter 
the orientation of the 220-loop and thus optimize contacts between 
L226 and Siaw2,6Gal-containing receptors, thereby increasing the 
preference for «2,6 linkages. 

Recent studies reported that 2009 pandemic H1N1 and H5N1 
viruses show high genetic compatibility’. These two viruses have 
been isolated from pigs****, which have been considered as ‘mixing 
vessels’ for the reassortment of avian, swine and human strains. Thus, 
the coexistence of HSN1 and 2009 pandemic H1N1 viruses could pro- 
vide an opportunity for the generation of transmissible H5 avian- 
human reassortants in mammals. Therefore, we generated reassortant 
viruses possessing the mutant VN1203 HAs generated above, and the 
seven remaining gene segments from a prototype 2009 pandemic 
HINI virus (A/California/04/2009, CA04). Experiments with viruses 
possessing the wild-type HA cleavage site were performed in enhanced 
biosafety level 3 (BSL3+) containment laboratories approved for such 
use by the Centers for Disease Control and Prevention (CDC) and the 
United States Department of Agriculture (USDA). Because efficient 
human-to-human transmission is a critical feature of pandemic 
influenza viruses, we examined the growth and transmissibility of 
reassortant viruses in ferrets, which are widely accepted as an animal 
model for influenza virus transmissibility and pathogenesis studies. 
Because the E119G/V1521/N224K/Q226L and N224K/Q226L variants 
bound extensively to human tracheal epithelia (Fig. 2c), we generated 
by reverse genetics (rg) three H5 reassortant viruses possessing the 
VN1203 HA or mutant HAs (all with the wild-type multibasic cleavage 
site) and the remaining genes from the CA04 virus. The VN1203 HA 
mutants tested included the one containing four mutations, E119G, 
V1521, N224K and Q226L (designated rg(E119G/V1521/N224K/ 
Q226L)/CA04), and another containing two mutations, N224K and 
Q226L (designated rg(N224K/Q226L)/CA04). 

To determine whether the introduced HA mutations affected the 
replication of the H5 reassortant viruses, six ferrets were inoculated 
intranasally with 10° plaque-forming units (p.f.u.) of virus. On day 3 
after infection, a recombinant virus whose genes all came from CA04, 
rgCA04, replicated efficiently in the respiratory organs of infected 
animals, and was isolated from the colon, but not from any other 
organs tested (Fig. 3 and Supplementary Table 2). A virus possessing 
H5 VN1203 HA and the remaining genes from CA04 (designated 
rgVN1203/CA04) replicated to titres comparable to those of rgCA04 
in nasal turbinates, but substantially less in the lungs. By contrast, the 
two H5 reassortant viruses with HA mutations (rg(E119G/V152I/ 
N224K/Q226L)/CA04 and rg(N224K/Q226L)/CA04) were severely 
limited in their replicative ability in trachea. Although virus titres in 
nasal turbinates and lung were not statistically different between 
rg(N224K/Q226L)/CA04 and rgCA04, the virus titre in nasal turbi- 
nates was significantly lower in animals inoculated with rg(E119G/ 
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Figure 3 | Virus replication in respiratory organs. Ferrets were infected 
intranasally with 10° p.f.u. of virus. Three ferrets per group were killed on days 3 
and 6 after infection for virus titration. Virus titres in nasal turbinates, trachea 
and lung were determined by use of a plaque assay on MDCK cells. Horizontal 
bars show the mean. Asterisks indicate virus titres significantly different from 
that of rgCA04 (Dunnett’s test; P< 0.05). 


V1521/N224K/Q226L)/CA04 than in animals inoculated with rgCA04 
(Dunnett’s test; P = 0.0002; Fig. 3). Notably, rgVN1203/CA04 (avian- 
type receptor binder) replicated efficiently in nasal turbinates of 
ferrets, which have a similar sialic acid receptor distribution pattern 
to that of the human respiratory tract”**°. The reason for this discrep- 
ancy is unclear; however, replication of avian H5N1 viruses in ferret 
nasal turbinates has been reported'*”’. 

Although virus titres in respiratory organs were generally lower on 
day 6 after infection than on day 3 after infection, rg(N224K/Q226L)/ 
CA04 still showed high levels of replication at day 6 after infection; 
titres in nasal turbinates ranged from 10*° to 10°" p.fu.g * (Fig. 3). 
Sequence analysis of viruses in nasal turbinates on day 6 after infection 
revealed that viruses in ferret 2 and ferret 3 possessed N158D and 
N158K mutations in their HA (in addition to the original two muta- 
tions), respectively, leading to the loss of the glycosylation site at posi- 
tion 158 (that is, 158N-S-T to 158D-S-T or 158K-S-T; Fig. la and 
Supplementary Table 3). In nasal turbinates on day 6 after infection, 
the titre of the virus with the N158D/N224K/Q226L mutations 
(10° p.f.u. gl; see Fig. 3, ferret 2 of rg(N224K/Q226L)/CA04) was 
approximately four orders of magnitude higher than that of the 
original rg(N224K/Q226L)/CA04 (10*°p.fu.g '; Fig. 3, ferret 1 of 
rg(N224K/Q226L)/CA04), whereas the virus with the N158K/ 
N224K/Q226L mutations (10°° p.f.u. gs Fig. 3, ferret 3 of 
rg(N224K/Q226L)/CA04) grew to one order of magnitude higher than 
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the original mutant. These data indicate that the additional mutation 
N158D improved the replication of rg(N224K/Q226L)/CA04 in ferrets. 
To test the effect of this mutation on the replication of H5 reassortant 
viruses in ferrets, we examined the replicative ability of a virus with 
the triple N158D/N224K/Q226L HA substitutions in ferrets. This 
HA(N158D/N224K/Q226L)/CA04 virus replicated efficiently in 
infected animals, except in the trachea (Fig. 3 and Supplementary 
Table 2). On day 3 after infection, this virus was isolated from the brain 
of two of the three animals tested, although we did not observe neuro- 
logical signs in these animals. These results indicate that the N158D 
mutation contributed to the efficient growth in the nasal turbinates of 
ferrets of an H5 reassortant virus with the N224K/Q226L mutations. 
Removal of the glycosylation site at position 158 has been reported to 
result in enhanced binding of H5N1 viruses to human-type receptors 
in combination with the Q226L/G228S mutations’. A previous study 
showed that H5N1 viruses lacking this glycosylation site transmit effi- 
ciently by direct contact among guinea-pigs’'. By contrast, H5N1 
viruses that acquire this glycosylation site lose the ability to transmit 
among guinea-pigs. Therefore, we speculated that the loss of the gly- 
cosylation site in HA(N158D/N224K/Q226L)/CA04 virus may affect 
its transmissibility in ferrets. 

To assess the ability of H5 reassortant viruses with human-type 
receptor specificity to transmit between ferrets, we placed naive ferrets 
in wireframe cages next to ferrets inoculated with 10° p.f.u. of rgCA04, 
rgVN1203/CA04, rg(N224K/Q226L)/CA04, or HA(N158D/N224K/ 
Q226L)/CA04 (Supplementary Fig. 7). Similar to previous experi- 
ments”, rgCA04 was efficiently transmitted via respiratory droplets 
to all three contact ferrets, as evidenced by the detection of virus in 
nasal washes and haemagglutination inhibition (HI) antibody in these 
animals (Table 1 and Fig. 4). By contrast, rgVN1203/CA04 and 
rg(N224K/Q226L)/CA04 were not transmitted; neither virus shedding 
nor seroconversion was detected in any contact animals, despite the 
binding of the latter to Siax2,6Gal. This result was consistent with that 
of previous studies in which human-type receptor recognition was 
shown to be necessary but not sufficient for respiratory droplet trans- 
mission of an H5N1 virus in a ferret model’*"*. In the HA(N158D/ 
N224K/Q226L)/CA04-inoculated group, virus was recovered from 
two of the six contact ferrets (pairs 1 and 2) between days 5 and 7 after 
contact. Moreover, seroconversion was detected in five animals 
including those from which virus was recovered. No animals died in 
the course of these transmission experiments. This finding demon- 
strates the generation of an H5 HA that supports virus transmission by 
respiratory droplets among ferrets. 

To determine whether additional mutations occurred in the HA of 
HA(N158D/N224K/Q226L)/CA04 during transmission, viral RNA was 
analysed from nasal washes of inoculated and contact ferrets (Fig. 4 and 
Supplementary Table 4). On day 5 after infection, the A242S and T318I 
mutations in HA were present in five (pairs 1, 3, 4, 5 and 6) and one (pair 
2) of the six inoculated animals, respectively. Viruses derived from the 
contact animals of pair 1 on day 7 after contact had two changes in HA 


(K193N and A242S) (Fig. 1a), whereas those derived from the contact 
animals of pair 2 contained a single change in HA (T318I) (Fig. 1b), 
indicating that additional changes in HA occurred during the infection 
of ferrets with HA(N158D/N224K/Q226L)/CA04. No mutations in the 
remaining genes were detected in any of these viruses from nasal washes 
compared with the CA04 virus sequences. 

Because HA(N158D/N224K/Q226L)/CA04 was isolated from only 
one-third of the contact animals, we isolated a virus from the nasal 
wash of the contact ferret that shed a high titre (107° p-fu. ml‘) of 
virus on day 7 after contact (pair 2) (Fig. 4d) to evaluate the replication 
and transmissibility of that virus in ferrets. This mutant virus, desig- 
nated HA(N158D/N224K/Q226L/T318I)/CA04, replicated efficiently 
in the nasal turbinates and was isolated from brain tissue (Fig. 3 and 
Supplementary Table 2). In the transmission study, four of the six 
contact ferrets were positive for virus between days 3 and 7 after 
contact, and all contact animals were seropositive; no animals died 
in the course of the transmission experiments (Table 1; Fig. 4e and 
Supplementary Fig. 8). Notably, this transmission pattern is compar- 
able to that of the 1918 pandemic H1N1 virus when tested under the 
same experimental conditions; the 1918 pandemic virus was recovered 
from the nasal wash of two of three contact animals (our own unpub- 
lished data). Sequence comparison of viruses from inoculated and 
contact animals identified mutations at positions 225 and 242 as well 
as a reversion at position 224 (Fig. la and Supplementary Table 5) (in 
addition to the original four mutations) although the 224 reversion was 
found only in viruses from inoculated ferrets. Collectively, these find- 
ings demonstrate that four amino acid substitutions (N158D/N224K/ 
Q226L/T318I) in H5 HA confer efficient respiratory droplet transmis- 
sion in ferrets to a virus possessing an H5 HA in a 2009 pandemic 
HIN1 backbone. We also confirmed that recombinant viruses posses- 
sing the three HA mutations N158D, N224K and Q226L, or the four 
HA mutations N158D, N224K, Q226L and T318I, and the NA of 
VN1203 in a PR8 backgrand (designated N158D/N224K/Q226L or 
N158D/N224K/Q226L/T318I, respectively) preferentially bind to 
Siau2,6Gal and attach to human tracheal epithelia (Fig. 2c, d). 

HA(N158D/N224K/Q226L/T318])/CA04 transmitted by respir- 
atory droplet more efficiently than HA(N158D/N224K/Q226L)/ 
CA04, raising the possibility that the T318I mutation is involved in 
the efficient transmission of avian H5N1/pandemic H1N1 reassor- 
tants. To explore the functional role of this mutation in respiratory 
droplet transmission, we generated an H5 reassortant expressing the 
H5 HA with the T318I mutation and examined its receptor-binding 
specificity and transmissibility. This reassortant (designated rgT318I/ 
CA04) bound to only Siax2,3Gal and showed little binding to human 
tracheal epithelia (Fig. 2c, d). rgT'318I/CA04 did not transmit via 
respiratory droplet among ferrets (Table 1 and Fig. 4f), although it 
replicated in nasal turbinates and trachea as efficiently as rgCA04 
(Fig. 3 and Supplementary Table 2). These results indicate that the 
T318I mutation alone is not sufficient for H5 reassortant viruses to 
transmit efficiently among ferrets. 


Table 1 | Transmission in ferrets inoculated with H5 avian-human reassortant viruses 


Virus Inoculated ferrets Contact ferrets 
Weight loss Peak virus titre in nasal wash Seroconversion Virus detection in Seroconversion 
(%)* (mean logio0(p.f.u. ml~4)) (positive and total numbers) nasal wash (positive (positive and total numbers) 
(days after inoculation) (HI titre)+ and total numbers) (HI titre) 

rgCA04 3 of 3.(15.1) 7.501) 3 of 3 (£1,280, =1,280, =1,280) 30f3 3 of 3 (£1,280, =1,280, =1,280) 
rgVN1203/CA04 3 0f 35.9) 5.3 (5) 3 of 3 (80, 40, 80) 0 of 3 0 of 3 (<10, <10, <10) 
rg(N224K/Q226L)/CA04 2 of 3 (7.8)t 3.9:(5) 3 of 3 (£1,280, =1,280, =1,280) 0 of3 0 of 3 (<10, <10, <10) 
HA(N158D/N224K/Q226L)/ 6 of 6 (5.7) 6.7 (3) 6 of 6 (640, =1,280, =1,280, 640, 2 of 6 5 of 6 (160, 320, 20, 160, 40, <10) 
CA04 21,280, =1,280) 
HA(N158D/N224K/Q226L/ 6 of 6 (9.8) 6.1 (5) 6 of 6 (£1,280, =1,280, 640, =1,280, 4 of 6 6 of 6 (640, 640, =1280, 80, 
T3181)/CA04 21,280, =1,280) 21,280, 320) 
rgT3181/CA04 3 of 5 (1.5)8 5.6 (3) 5 of 5 (40, 20, 20, 40, 40) Oof 5 0 of 5 (<10, <10, <10, <10, <10) 


* Maximum percentage weight loss is shown. 


+ Haemagglutination inhibition (HI) assays were carried out with homologous virus and turkey red blood cells. 


£ One animal did not lose any body weight. 
§ Two animals did not lose any body weight. 
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Figure 4 | Respiratory droplet transmission of H5 avian-human 
reassortant viruses in ferrets. a—f, Groups of three, five, or six ferrets were 
inoculated intranasally with 10° p.f.u. of rgCA04 (a), rgVN1203/CA04 

(b), rg(N224K/Q226L)/CA04 (c), HA(N158D/N224K/Q226L)/CA04 

(d), HA(N158D/N224K/Q226L/T318I)/CA04 (e), or rgT318I/CA04 (f). One 
day after infection, three, five, or six naive ferrets were placed in adjacent cages. 
Nasal washes were collected every other day from both inoculated (left panel) 
and contact (right panel) animals for virus titration. Virus titres in organs were 
determined by using a plaque assay on MDCK cells. The lower limit of 
detection is indicated by the horizontal dashed line. 


Influenza virus HA protein has membrane-fusion as well as receptor- 
binding activity. Notably, in the three-dimensional model of influenza 
A virus HA, residue 318 is located proximally to the fusion peptide 
(Fig. 1b), which has key roles in the membrane fusion process. To 
assess the effect of HA mutations on low-pH-induced membrane 
fusion activity, we examined the pH at which the fusion activity of 
wild-type and mutant HA was activated (Fig. 5). The wild-type HA had 
a threshold for membrane fusion of pH 5.7; the N224K/Q226L and 
N158D/N224K/Q226L mutations raised the threshold for fusion to 
>pH5.9, whereas the T318I mutation reduced the threshold for 
fusion to pH 5.5. The N158D/N224K/Q226L/T318I mutations showed 
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Figure 5 | Polykaryon formation by HeLa cells expressing wild-type or 
mutant HAs after acidification at low pH. a, The efficiency of polykaryon 
formation over a pH range of 5.4—6.0 was estimated from the number of nuclei 
in polykaryons divided by the total number of nuclei in the same field. The 
mean and standard deviations determined from five randomly chosen fields of 
cell culture are shown. Single asterisks indicate values significantly different 
between the wild-type HA and the N224K/Q226L or N158D/N224K/Q226L 
HA (Tukey test; P< 0.05). The double asterisk indicates values significantly 
different between the T318I HA and the N224K/Q226L or N158D/N224K/ 
Q226L HA (Tukey test; P < 0.05). b, Representative fields of cells expressing the 
indicated HAs and exposed to pH 5.4, 5.6, or 5.8 are shown. Images were taken 
at X10 magnification. 


wild-type fusogenic properties (that is, a threshold at pH 5.7). The HA 
of influenza virus undergoes a low-pH-dependent conformational 
change, which is required for fusion of the viral envelope with the 
target membrane”’. Such a conformational change to a fusion-active 
form can also lead to viral inactivation. Therefore, sustained and effi- 
cient human-to-human transmission of virus may require a certain 
level of stability of the HA protein in an acidic environment, as the pH 
of human nasal mucosa, where human influenza viruses replicate 
primarily, is approximately pH 5.5-6.5 (ref. 34). Our findings suggest 
that an increase in the pH threshold for fusion as a result of the N224K/ 
Q226L mutations that shift the HA receptor recognition from avian- 
type to human-type may reduce HA protein stability; however, the 
T3181 mutation decreases the pH threshold for fusion activity, result- 
ing in a stable mutant HA. 

Because heat treatment at neutral pH is also known to promote a 
fusogenic form of HA protein**”*’ and serve as a surrogate assay for HA 
stability’’, we next tested whether the HA mutations described above 
affect the heat stability of the HA protein. Wild-type and mutant HA 
viruses were incubated at 50 °C for various times, after which the loss 
of infectivity and haemagglutination activity were determined. The 
wild-type and N224K/Q226L viruses lost most of their infectivity by 
heating for 60 min (>5.5-log)9 decrease in titre; Fig. 6a), whereas the 
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Figure 6 | Effect of heat treatment on the infectivity and haemagglutination 
activity of viruses. Aliquots of a virus stock containing 128 HA units were 
incubated for the times indicated at 50 °C. a, Virus titres in heat-treated samples 
were determined by plaque assays on MDCK cells. b, Haemagglutination titres 
in heat-treated samples were determined by using haemagglutination assays 
with 0.5% TRBCs. Each point represents the mean + standard deviation from 
triplicate experiments. 
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N158D/N224K/Q226L and N158D/N224K/Q226L/T318I mutants 
exhibited considerable tolerance to high temperature (3.9- and 
3.4-log;, decrease after a 60-min incubation, respectively) and the 
T3181 mutant was most resistant (only a 1.4-logig decrease under 
the same conditions). In haemagglutination assays, the N224K/ 
Q226L mutant HA lost activity more rapidly than did the wild-type 
HA, and N158D/N224K/Q226L lost activity more rapidly than did the 
N158D/N224K/Q226L/T318I mutant (Fig. 6b). Thus, addition of the 
N158D mutation to the N224K/Q226L HA increased HA stability and 
subsequent addition of the fourth mutation, T318I, rendered the HA 
protein even more stable. Taken together, these results suggest that the 
addition of the T318I mutation to H5 HAs that preferentially recog- 
nize human-type receptors restores HA protein stability, thereby 
allowing a virus carrying the N158D/N224K/Q226L/T318I mutations 
in HA to transmit efficiently via respiratory droplet among ferrets. In 
conclusion, a fine balance of mutations affecting different functions in 
HA (such as receptor-binding specificity and HA stability) may be 
critical to confer transmissibility in ferrets. 

We next compared the pathogenicity in ferrets of H5 avian-human 
reassortants with that of the pandemic H1N1 virus CA04 (Fig. 7, 
Supplementary Information and Supplementary Figs 9-11). The 
control virus, rgCA04, caused substantial body weight loss (15.1%) 
(Table 1 and Supplementary Fig. 9). By contrast, the four reassortant 
viruses caused only modest weight loss (<10%) in most of the animals. 
However, no statistically significant differences in body weight loss 
were found between the reassortant viruses and rgCA04. Pathological 
examination revealed similar histological changes and levels of viral 
antigens in the nasal mucosa of rgCA04-, HA(N158D/N224K/Q226L)/ 
CA04- and HA(N158D/N224K/Q226L/T318I)/CA04-infected ferrets 
(Fig. 7a, b). In the rgVN1203/CA04 and rg(N224K/Q226L)/CA04 
groups, however, less tissue damage was found in the nasal mucosa 
compared with the rgCA04 group on day 3 after infection (Dunnett’s 
test; P = 0.0057 and 0.0175, respectively; Fig. 7b). In addition, all three 


Figure 7 | Pathological analyses of H5 avian- 
human reassortant viruses. a, Representative 
histological changes in nasal turbinates from 
influenza-virus-infected ferrets. Three ferrets per 
group were infected intranasally with 10° p.f.u. of 
virus, and tissues were collected on day 3 after 
infection for pathological examination. Uninfected 
ferret tissues served as negative controls (normal). 
Left panel, haematoxylin-and-eosin staining. Right 
panel, immunohistochemical staining for viral 
antigen detection (brown staining). Scale bars, 

50 um. b, Pathological severity scores in infected 
ferrets. To represent comprehensive histological 
changes, respiratory tissue slides were evaluated by 
scoring the pathological changes and viral antigen 
expression levels. The pathological scores were 
determined for each animal in each group (n = 3 
per group on days 3 and 6 after infection) using the 
following scoring system: 0, no pathological 
change/antigen negative; 1, affected area (<30%) 
or only interstitial lesion/rare viral antigens; 2, 
affected area (<80%, =30%)/moderate viral 
antigens; 3, severe lesion (=80%)/many viral 
antigens. Nasal, pathological changes in the nasal 
mucosa; nasal Ag, viral antigens in the nasal 


mgGAde mucosa. Asterisks indicate virus pathological 
rgVN1203/CA04 scores significantly different from that of rgCA04 

(Dunnett’s test; P< 0.05). Error bars denote 
rg(N224K/Q226L)/CA04 standard deviation. 


HA(N158D/N224K/Q226L)/CA04 


HA(N158D/N224K/Q226L/T3181)/CA04 


©2012 Macmillan Publishers Limited. All rights reserved 


viruses caused lung lesions (Supplementary Information and 
Supplementary Figs 10 and 11). 

To assess whether current control measures may be effective against 
the H5 transmissible reassortant mutant virus, we examined the reac- 
tivity of sera from individuals vaccinated with an H5N1 prototype 
vaccine against a virus possessing the N158D/N224K/Q226L/ 
T3181 mutations in HA. We found that pooled human sera from 
individuals immunized with this vaccine reacted with the virus posses- 
sing the mutant H5 HA (N158D/N224K/Q226L/T318]) at a higher 
titre than with a wild-type H5 HA virus (VN1203/PR8; Supplementary 
Table 6), indicating that current H5N1 vaccines would be efficacious 
against the H5 transmissible reassortant mutant virus. In addition, the 
H5 transmissible reassortant mutant virus (HA(N158D/N224K/ 
Q226L/T318I)/CA04) was highly susceptible to a licensed NA inhibitor, 
oseltamivir (Supplementary Table 7). These experiments show that 
appropriate control measures would be available to combat the trans- 
missible virus described in this study. 

Currently, we do not know whether the mutations that we identified 
in this study that allowed the HA(N158D/N224K/Q226L/T318I)/ 
CA04 virus to be transmissible in ferrets would also support sustained 
human-to-human transmission. In particular, we wish to emphasize 
that the transmissible HA(N158D/N224K/Q226L/T3181)/CA04 virus 
possesses seven segments (all but the HA segment) from a human 
pandemic 2009 H1N1 virus. Human-virus-characteristic amino acids 
in these seven segments may have critically contributed to the respir- 
atory droplet transmission of the HA(N158D/N224K/Q226L/T318])/ 
CA04 virus in ferrets. Examples include amino acids in the PB2 
polymerase protein that confer efficient replication in mammalian, 
but not avian, cells. As the PB2 gene of the HA(N158D/N224K/ 
Q226L/T3181)/CA04 virus is of human virus origin, the virus 
possesses high replicative ability in mammalian cells. In contrast, most 
avian virus PB2 proteins lack these human-type amino acids, although 
one of these changes (a glutamic-acid-to-lysine mutation at position 
627) is found in highly pathogenic avian H5N1 viruses circulating in 
the Middle East**. As a second example, the viral NA gene may con- 
tribute to viral transmissibility. The NA protein cleaves o-ketosidic 
linkages between a terminal sialic acid and an adjacent sugar residue, 
an activity that balances the sialic-acid-binding activity of HA. A 
recent study found that a human virus NA gene was critical to confer 
limited transmissibility to a mutant H5 avian-human reassortant 
virus'*. In general, a human-type receptor recognizing H5 HA alone 
may not be sufficient to confer transmissibility in mammals, but may 
have to act together with other human-virus-characteristic traits (in 
PB2, NA, and/or other viral proteins). Therefore, at this point we 
cannot predict whether the four mutations in the H5 HA identified 
here would render a wholly avian H5N1 virus transmissible. 

Three of the residues identified here (N224, Q226 and T318) have 
been strictly conserved among H5 HA proteins isolated since 2003. 
However, as H5N1 viruses continue to evolve and infect people, 
receptor-binding variants of H5N1 viruses, including avian—-human 
reassortant viruses as tested here, may emerge. One of the four muta- 
tions we identified in our transmissible virus, the N158D mutation, 
results in loss of a glycosylation site. Many H5N1 viruses isolated in the 
Middle East, Africa, Asia and Europe do not have this glycosylation 
site. Therefore, only three nucleotide changes are needed for the HA of 
these viruses to support efficient transmission in ferrets. In addition, 
the H5N1 viruses circulating in these geographic areas also possess a 
glutamic-acid-to-lysine mutation at position 627 in the PB2 protein, 
which promotes viral replication in certain mammals, including 
humans***. Therefore, these viruses may be several steps closer to 
those capable of efficient transmission in humans and are of concern. 

Our study highlights the pandemic potential of viruses possessing 
an H5 HA. Although current vaccines may protect against a virus 
similar to that tested here, the continued evolution of H5N1 viruses 
reinforces the need to prepare and update candidate vaccines to H5 
viruses. The amino acid changes identified here will help individuals 
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conducting surveillance in regions with circulating H5N1 viruses (for 
example, Egypt, Indonesia, Vietnam) to recognize key residues that 
predict the pandemic potential of isolates. Rapid responses in a poten- 
tial pandemic situation are essential in order to generate appropriate 
vaccines and initiate other public health measures to control infection. 
Furthermore, our findings are of critical importance to those making 
public health and policy decisions. 

Our research answers a fundamental question in influenza research: 
can H5-HA-possessing viruses support transmission in mammals? 
Moreover, our findings have suggested that different mechanisms 
(that is, receptor-binding specificity and HA stability) may act in con- 
cert for efficient transmissibility in mammals. This knowledge will 
facilitate the identification of additional mutations that affect viral 
transmissibility; the monitoring of this expanded set of changes in 
natural isolates may improve our ability to assess the pandemic poten- 
tial of H5N1 viruses. Thus, although a pandemic H5N1 virus may not 
possess the amino acid changes identified in our study, the findings 
described here will advance our understanding of the mechanisms and 
evolutionary pathways that contribute to avian influenza virus trans- 
mission in mammals. 


METHODS SUMMARY 

Viruses. All recombinant viruses were generated by using reverse genetics essen- 
tially as described previously’®. All experiments with the viruses possessing the wild- 
type HA cleavage site were performed in an enhanced biosafety level 3 (BSL3+) 
containment laboratory approved for such use by the CDC and the USDA. 
Infection and transmission in ferrets. Six-ten-month-old female ferrets (Triple 
F Farms) were intramuscularly anaesthetized and intranasally inoculated with 
10° p.f.u. (500 pl) of virus. On days 3 and 6 after infection, ferrets were killed for 
virological and pathological examinations. The virus titres in various organs were 
determined by use of plaque assays in MDCK cells. 

For transmission studies in ferrets, animals were housed in adjacent transmission 
cages that prevented direct and indirect contact between animals but allowed spread 
of influenza virus through the air (Showa Science; Supplementary Fig. 7). Ferrets 
were intranasally inoculated with 10° p-f.u. (500 pl) of virus (inoculated ferrets). 
Twenty-four hours after infection, naive ferrets were each placed in a cage adjacent 
to an inoculated ferret (contact ferrets). To assess viral replication in the nasal 
turbinates, we determined viral titres in nasal washes collected from virus-inocu- 
lated and contact ferrets on day 1 after inoculation or co-housing, respectively, and 
then every other day. Animal studies were performed in accordance with Animal 
Care and Use Committee guidelines of the University of Wisconsin-Madison. 
Biosafety and biosecurity. All recombinant DNA protocols were approved by the 
University of Wisconsin-Madison’s Institutional Biosafety Committee after risk 
assessments were conducted by the Office of Biological Safety, and by the 
University of Tokyo’s Subcommittee on Living Modified Organisms, and, when 
required, by the competent minister of Japan. In addition, the University of 
Wisconsin-Madison Biosecurity Task Force regularly reviews the research pro- 
gram and ongoing activities of the laboratory. The task force has a diverse skill set 
and provides support in the areas of biosafety, facilities, compliance, security and 
health. Members of the Biosecurity Task Force are in frequent contact with the 
principal investigator and laboratory personnel to provide oversight and assure 
biosecurity. Experiments with viruses possessing the wild-type HA cleavage site 
were performed in enhanced BSL3 containment laboratories approved for such 
use by the CDC and the USDA. Ferret transmission studies were conducted by 
three scientists with both DVM and PhD degrees who each had more than a 
minimum of 6 years of experience with highly pathogenic influenza viruses and 
animal studies with highly pathogenic viruses. Our staff wear powered air-puri- 
fying respirators that filter the air, and disposable coveralls; they shower out on exit 
from the facility. The containment facilities at University of Wisconsin-Madison 
were designed to exceed standards outlined in Biosafety in Microbiological and 
Biomedical Laboratories (5th edition; http://www.cdc.gov/biosafety/publications/ 
bmbl5/BMBL.pdf). Features of the BSL3-enhanced suites include entry/exit 
through a shower change room, effluent decontamination, negative air-pressure 
laboratories, double-door autoclaves, double HEPA-filtered exhaust air, and gas 
decontamination ports. The BSL3-Agriculture suite features include all those 
listed for BSL3-enhanced plus HEPA-filtered supply and double-HEPA-filtered 
exhaust air, double-gasketed watertight and airtight seals, airtight dampers on all 
ductwork, and the structure was pressure-decay tested during commissioning. 
The University of Wisconsin-Madison facility has a dedicated alarm system that 
monitors all building controls and sends alarms (~500 possible alerts). 
Redundancies and emergency resources are built-in to the facility including two 
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air handlers, two compressors, two filters each place filters are needed, two effluent 
sterilization tanks, two power feeds to the building, an emergency generator in case 
of a power failure and other physical containment measures in the facility that 
operate without power. Biosecurity monitoring of the facility is ongoing. All 
personnel undergo Select Agent security risk assessment by the United States 
Criminal Justice Information Services Division and complete rigorous biosafety, 
BSL3 and Select Agent training before participating in BSL3-level experiments. 
Refresher training is scheduled on a regular basis. The principal investigator 
participates in training sessions and emphasizes compliance to maintain safe 
operations and a responsible research environment. The laboratory occupational 
health plan is in compliance with the University of Wisconsin-Madison 
Occupational Health Program. Select agent virus inventory is checked monthly 
and submitted to the University of Wisconsin-Madison Research Compliance 
Specialist. Virus inventory is submitted 1-2 times per year to the file holder in 
the Select Agent branch of the CDC. The research program, procedures, occu- 
pational health plan, documentation, security and facilities are reviewed annually 
by the University of Wisconsin- Madison Responsible Official and at regular inter- 
vals by the CDC and the Animal and Plant Health Inspection Service (APHIS) as 
part of the University of Wisconsin-Madison Select Agent Program. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cells. Madin—Darby canine kidney (MDCK) cells and MDCK cells overexpressing 
Siao2,6Gal (AX4 cells'?) were maintained in Eagle’s minimal essential medium 
(MEM) containing 5% newborn calf serum. Human embryonic kidney 293T cells 
were cultured in Dulbecco’s modified Eagle’s medium containing 10% fetal bovine 
serum (FBS). HeLa cells were maintained in MEM containing 10% FBS. All cells 
were maintained at 37 °C in 5% CO3. 

Plasmid construction and reverse genetics. Plasmid constructs for viral RNA 
production (pPoll)—containing the genes of the A/Vietnam/1203/2004 (H5N1; 
VN1203), A/Puerto Rico/8/34 (H1N1; PR8), A/Kawasaki/173/2001 (H1N1; 
K173) and A/California/04/2009 (H1N1; CA04) viruses flanked by the human 
RNA polymerase I promoter and the mouse RNA polymerase I terminator—were 
constructed as described'®. The multibasic amino acids at the haemagglutinin 
(HA) cleavage site (RERRRKKR|G) of the reassortant viruses between VN1203 
and PR8 were changed to RETR|G by site-directed mutagenesis. All transfectant 
viruses were generated by using reverse genetics essentially as described previ- 
ously'®. Recombinant viruses were amplified in MDCK or AX4" cells and stored at 
—80°C until use. The HA segment of all viruses was sequenced to ensure the 
absence of unwanted mutations. All experiments with the reassortant viruses 
between VN1203 and CA04 were performed in enhanced biosafety level 3 con- 
tainment laboratories approved for such use by the CDC and the USDA. 

To introduce random mutations into the globular head of the VN1203 HA 
protein, a 143-amino-acid region spanning residues 120-259 (H3 numbering) was 
selected. This region was subjected to PCR-based random mutagenesis by use of 
the GeneMorph II kit (Stratagene) following the manufacturer’s instructions. The 
targeted mutation rate (1-2 amino acid replacements per molecule) was achieved 
through optimization of the template quantity, and was confirmed by sequence 
analysis of 48 individual clones. By using a PCR-based cloning strategy, we 
inserted the mutagenized region into its respective vector containing the 
VN1203 HA gene between the human RNA polymerase I promoter and mouse 
RNA polymerase I terminator sequences. The composition of the plasmid library 
was confirmed by sequencing. The plasmid library was then used to generate an 
influenza virus library, essentially as described’*. The size of the virus library was 
7X 10° pfu. 

Preparation of sialidase-treated TRBCs. Turkey red blood cells (TRBCs) were 
washed three times with phosphate-buffered saline (PBS), and diluted to 20% (vol/ 
vol) in PBS. TRBCs (1 ml) were incubated with 500 U of «2,3-sialidase from 
Salmonella enterica serovar Typhimurium LT2 (NEB) for 20-24h at 37°C, 
washed three times in PBS, and re-suspended in PBS or MEM containing 1% 
bovine serum albumin (BSA) (MEM/BSA). 

Haemagglutination assay. Viruses (50 1l) were serially diluted with 50 ul of PBS 
in a microtitre plate. An equal volume (that is, 50 pl) of a 0.5% (vol/vol) TRBC 
suspension was added to each well. The plates were kept at room temperature and 
haemagglutination was assessed after a 1-h incubation. 

Virus library screening. To select VN1203 HA variants that had acquired the 
ability to recognize human-type receptors, three parallel experiments were carried 
out, each with 0.7 X 10° viruses. The virus library was first incubated with 0.1 ml of 
10% (vol/vol) «2,3-sialidase-treated TRBCs for 10 min at 4 °C. After this incuba- 
tion, the TRBCs and bound viruses were pelleted at 1,000 r.p.m. for 1 min, and the 
pellets then washed ten times in MEM/BSA containing 313 mM NaCl. Bound 
viruses were eluted by incubation at 37 °C for 30 min and then diluted to approxi- 
mately 0.5 virus per well (determined by virus titration in a pilot study). Individual 
viruses were then amplified in AX4 cells, which overexpress Siax2,6Gal’’. 
Individual viruses were re-screened by using haemagglutination assays with 
o2,3-sialidase-treated TRBCs. 

Solid-phase binding assay. Viruses were grown in MDCK cells, clarified by low- 
speed centrifugation, laid over a cushion of 30% sucrose in PBS, and ultracentri- 
fuged at 25,000 r.p.m. for 2h at 4°C. Virus stocks were aliquoted and stored at 
—80 °C. Virus concentrations were determined by using haemagglutination assays 
with 0.5% (vol/vol) TRBCs. The direct receptor-binding capacity of viruses was 
examined by use ofa solid-phase binding assay as previously described’. Microtitre 
plates (Nunc) were incubated with the sodium salts of sialylglycopolymers (poly- 
L-glutamic acid backbones containing N-acetylneuraminic acid linked to galactose 
through either an 02,3 (Neu5Aco2,3GalB1,4GlcNAcB1-pAP) or an 02,6 
(Neu5Aco2,6GalB1,4GlcNAcB1-pAP) bond) in PBS at 4°C overnight. After the 
glycopolymer solution was removed, the plates were blocked with 0.15 ml of PBS 
containing 4% BSA at room temperature for 1 h. After four successive washes with 
ice-cold PBS, the plates were incubated in a solution containing influenza virus (8- 
32 HA units in PBS) at 4 °C overnight. After washing as described above, the plates 
were incubated for 2h at 4 °C with rabbit polyclonal antiserum to either K173 or 
VN1203 virus. The plates were then washed again as before and incubated with 
horseradish peroxidase (HRP)-conjugated goat anti-rabbit IgG antiserum for 2h 
at 4°C. After washing, the plates were incubated with O-phenylenediamine 
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(Sigma) in PBS containing 0.01% H,O, for 10min at room temperature, and 
the reaction was stopped with 0.05ml of 1M HCl. The optical density at 
490 nm was determined in a plate reader (Infinite M1000; Tecan). 

Virus binding to human airway tissues. Paraffin-embedded normal human 
trachea (US Biological) and lung (BioChain) tissue sections were deparaffinized 
and rehydrated. Sections were then blocked by using 4% BSA in PBS and 
covered with virus suspensions (64 HA units in PBS) at 4°C overnight. After 
being washed four times in ice-cold PBS, the sections were incubated with primary 
antibodies for 3h at 4°C. The primary antibodies used were as follows: a pool of 
mouse anti-VN1203 HA monoclonal antibodies (15A3, 3G2, 7A11, 8A3, 14C5 
and 18E1; Rockland); rabbit anti-K173 polyclonal antibody; rabbit anti- 
surfactant protein A polyclonal antibody (Millipore); and mouse anti- 
surfactant protein A monoclonal antibody (Abcam). Antibody binding was 
detected by using an IgG secondary antibody conjugated with Alexa Fluor 488 
or Alexa Fluor 633 (Molecular Probes). Sections were also counterstained with 
Hoechst 33342, trihydrochloride, trihydrate (Molecular Probes). The samples 
were examined by using confocal laser scanning microscopy (model LSM 510; 
Carl Zeiss). 

To confirm sialic-acid-specific virus binding, tissue sections were treated, before 

incubation with viruses, with Arthrobacter ureafaciens sialidase (Sigma) for 3 h at 
37 °C. Viruses bound to tissue were detected as described above. 
Experimental infection of ferrets. Animal studies were performed in accordance 
with the Animal Care and Use Committee guidelines of the University of Wisconsin- 
Madison. We used 6-10-month-old female ferrets (Triple F Farms) that were 
serologically negative by haemagglutination inhibition (HI) assay for currently 
circulating human influenza viruses. Six ferrets per group were anaesthetized 
intramuscularly with ketamine and xylazine (5-30 mg and 0.2-6 mg kg of body 
weight, respectively) and inoculated intranasally with 10° p.f.u. (500 jl) of viruses. 
On days 3 and 6 after infection, three ferrets per group were killed for virological 
and pathological examinations. The virus titres in various organs were determined 
by use of plaque assays in MDCK cells. 

Excised tissue samples of nasal turbinates, trachea, lungs, brain, liver, spleen, 
kidney and colon from euthanized ferrets were preserved in 10% phosphate- 
buffered formalin. Tissues were then trimmed and processed for paraffin 
embedding and cut into 5-um-thick sections. One section from each tissue sample 
was stained by using a standard haematoxylin-and-eosin procedure, whereas 
another one was processed for immunohistological staining with a mixture of 
two anti-influenza virus rabbit antibodies (1:2,000; R309 and anti- VN1203; both 
prepared in our laboratory) that react with CA04 and VN1203, respectively. 
Specific antigen-antibody reactions were visualized by using an indirect two- 
step dextran-polymer technique (Dako EnVision system; Dako) and 3,3’ 
diaminobenzidine tetrahydrochloride staining (Dako). 

Ferret transmission study. For transmission studies in ferrets, animals were 
housed in adjacent transmission cages that prevented direct and indirect contact 
between animals but allowed spread of influenza virus through the air (Showa 
Science; Supplementary Fig. 7). Three, five, or six ferrets were inoculated 
intranasally with 10° p-f.u. (500 ll) of virus (inoculated ferrets). Twenty-four hours 
after infection, three, five, or six naive ferrets were each placed in a cage adjacent to 
an inoculated ferret (contact ferrets). The ferrets were monitored for changes in 
body weight and the presence of clinical signs. To assess viral replication in nasal 
turbinates, we determined viral titres in nasal washes collected from virus- 
inoculated and contact ferrets on day 1 after inoculation or co-housing, respect- 
ively, and then every other day. 

Serological tests. Serum samples were collected between days 14 and 20 after infec- 
tion, treated with receptor-destroying enzyme, heat-inactivated at 56 °C for 30 min, 
and tested by use of an HI assay with 0.5% TRBCs (http://www.wpro.who.int/entity/ 
emerging diseases/documents/docs/manualonanimalaidiagnosisandsurveillance. 
pdf). Viruses bearing homologous HA were used as antigens for the HI tests. 
Polykaryon formation representing membrane fusion activity. Monolayers of 
HeLa cells grown in 12-well plates were transfected with the protein expression 
vector pCAGGS* encoding wild-type or mutant HA. At 24h after transfection, 
cells transiently expressing HA protein were treated with trypsin (1 pg ml” ') in 
MEM containing 0.3% BSA for 30 min at 37 °C to cleave the HA into its HAI and 
HA2 subunits. Polykaryon formation was induced by exposing the cells to low-pH 
buffer (145mM NaCl, 20mM sodium citrate (pH 6.0-5.4)) for 2 min at 37 °C. 
After this exposure, the low-pH buffer was replaced with MEM containing 10% 
FBS and the cells were incubated for 3h at 37 °C. The cells were then fixed with 
methanol and stained with Giemsa’s solution and photographed with a digital 
camera mounted on an inverted microscope (Nikon, Eclipse Ti). For quantitative 
analyses, cell nuclei were counted in five randomly chosen fields of cell culture. 
Polykaryon formation activity was calculated from the number of nuclei in 
polykaryons divided by the total number of nuclei in the same field. 
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Thermostability. Viruses (128 HA units in PBS) were incubated for the times 
indicated at 50 °C. Subsequently, infectivity and haemagglutination activity were 
determined by use of plaque assays in MDCK cells and haemagglutination assays 
using 0.5% TRBCs, respectively. 

Neuraminidase (NA) inhibition assay. To assess the sensitivity of viruses to 
the NA inhibitor oseltamivir, NA inhibition assays were performed as described 
previously”. 


Statistical analysis. All statistical analyses were performed using JMP 9.0.0 (SAS 
Institute Inc.). The statistical significance of differences between rgCA04 and H5 
avian/human reassortant viruses was determined by using a Dunnett’s test. 
Comparisons of polykaryon formation between wild-type and mutant HAs were 
done using Tukey’s test. P values of <0.05 were considered significant. 


46. Niwa, H., Yamamura, K. & Miyazaki, J. Efficient selection for high-expression 
transfectants with a novel eukaryotic vector. Gene 108, 193-199 (1991). 
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All jazzed up 


Bioscience is thriving in New Orleans as the city bounces back from multiple disasters. 


BY AMANDA MASCARELLI 


men hatched a plan to start a business. One 

man, Mark Heiman, was then a chief sci- 
entific officer at pharmaceutical company Eli 
Lilly, based in Indianapolis, Indiana; another, 
Dale Pfost, was a serial biotechnology entre- 
preneur; and the third, John Elstrott, was a pro- 
fessor of entrepreneurship at Tulane University 
in New Orleans and chair of Whole Foods 
Market, a natural-foods supermarket chain 
based in Austin, Texas. Their meeting led to 
the creation in 2010 of New Orleans biotech- 
nology start-up NuMe Health, which develops 
food compounds that promote the growth of 
healthy microbes in the gut. The founders 
attribute the partnership’s creation not only 
to the pooling of their respective strengths, 
but also, says Heiman, to “the magic” of New 
Orleans: the city’s innovative spirit, already 
known to inspire greatness in food and music. 


lE New Orleans, Louisiana, in 2009, three 


Increasingly, bioscience researchers are trying 
to capture the same inspiration. 

This bioscience renaissance is powered by 
a programme of investment by the state and 
federal governments and by the private sector, 
much of which was spurred by the effort to help 
the region to recover from the ravages of Hurri- 
cane Katrina in 2005. The area is also benefiting 
from research funds resulting from the Deep- 
water Horizon oil spill in 2010 (see Investment 
after the spill’).“This part of the country has 
become very attractive to people who are very 
idealistic and want to make a difference,’ says 
Vijay John, a chemical engineer at Tulane. “A 
lot of risk-taking young people have moved in. 
There’ a sense of commitment, of community.” 


BIOSCIENCE ON THE BAYOU 

Not long ago, New Orleans and the surround- 
ing area had virtually no start-up companies. 
But in the past few years, nearly 60 have sprung 
up, Many in the biosciences sector, attracting 
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the attention of big-time investors, out-of-state 
venture-capital funds and pharmaceutical 
companies. Although the city is certainly not 
yet an established biotechnology hub, onlook- 
ers see the potential for significant growth. 

“Tm beginning to see a critical mass of these 
elements of an entrepreneurial environment 
come together,’ says Tom Dickerson, adviser 
to Louisiana Fund I, a US$26-million health- 
care venture-capital fund based in Baton 
Rouge. The region offers a cost of living that 
is below the US average, as well as robust tax 
incentives for investors and start-ups and 
a strong pool of talent from research and 
educational institutions. 

Although start-ups often rely on just a 
handful of employees, the smattering of new 
businesses is already creating opportunities 
for research jobs and internships at all levels, 
from undergraduates to junior and senior 
researchers. The biotechnology boom is also 
offering scientists a chance to see their > 
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> work commercialized, says Dickerson, 
noting that technology-transfer opportunities 
have become a tool for attracting top-quality 
researchers to universities. 

The heart of the local biotechnology 
movement is BioDistrict New Orleans, a 
6-square-kilometre state-supported eco- 
nomic-development area established in the 
business district in 2005. It draws together 
intellectual resources from the city’s research 
institutions, including Tulane University, 
Xavier University of Louisiana, Louisiana 
State University (LSU) Health Sciences Center 
and Delgado Community College. One big 
project in the district is the Louisiana Cancer 
Research Center, a $102-million facility that 
has created 235 jobs — mostly in research — 
and drawn nearly $155 million in research 
grants. In 2011, officials also broke ground 
on the state’s new teaching hospital, the Uni- 
versity Medical Center, and construction has 
begun on the New Orleans Veterans Affairs 
Medical Center. 


INCUBATING INNOVATION 

Another key component of the BioDistrict 
is the 6,000-square-metre New Orleans Bio- 
Innovation Center, a non-profit life-sciences 
business incubator focused on university 
research and located on Canal Street near the 
French Quarter — close to the hub of New 
Orleans’ economic activity, and to Tulane and 
the LSU Health Sciences Center. 

The idea for the innovation centre took root 
in 2002, when Louisiana’s governor was seek- 
ing ways to spur economic development, forge 
a ‘knowledge economy’ rather than relying on 


tourism, and revive an area full of empty hotels 
and office buildings. “Everyone was looking 
for ways to rejuvenate the area while at the 
same time providing this kind of resource to 
the universities,’ says Aaron Miscenich, who 
became president of the centre in 2004. 

At the time, there was little local precedent 
for translating research into products. Mis- 
cenich didn’t see much of a culture of entre- 

preneurship. “There 


“As some of was brilliant research 
these ventures being done in the city, 
succeed, but none was being 
hopefully they brought to market,” 
can spool up he says. “Historically, 
into larger- the technologies were 
scale operations either leaving the 
serving a state or just sitting in 
national and filing cabinets.” And 


before Katrina, there 
was little economic 
investment to support such ventures, he adds. 

When the hurricane hit, plans for the facil- 
ity ground to a halt. “The level of uncertainty 
in our community was just staggering,” says 
Miscenich. In late 2005 and early 2006, centre 
principals re-evaluated the business model, 
says Miscenich, and cut the projected size of 
the centre by half, but the state of Louisiana 
stepped in to provide $47 million in con- 
struction costs, and building went ahead. The 
BioInnovation Center opened in 2011 and the 
state currently supports most of the facility’s 
annual operational costs. 

The centre houses around 20 tenants, from 
biotechnology start-ups and a clinical-research 
organization (CRO) to intellectual-property 


global market.” 


ENVIRONMENTAL SCIENCE 


attorneys and venture capitalists. Ultimately, 
it will provide space for up to 50 companies, 
which Miscenich hopes will draw well-estab- 
lished businesses and CROs to the city to create 
a wealth of research positions. “As some of these 
ventures succeed, hopefully they can spool up 
into larger-scale operations serving a national 
and global market,” says Michael Bernstein, 
provost at Tulane. He is optimistic that compa- 
nies will remain and generate more jobs. 


THE SEEDS OF AHUB 

Collaborations are already developing. NuMe 
Health is a tenant of the BioInnovation Center, 
and Heiman, who says the centre helps spur 
alliances, has built promising relationships 
with two companies in the building. “That 
led to a volley of brainstorming and an out- 
line of potential new areas for our companies,” 
he says. Heiman notes that Louisiana offers 
several incentives for start-ups, including an 
angel-investor tax credit and a ‘phase zero’ 
grant that funds companies to draw up pro- 
posals for higher-level funding; such schemes 
helped his company get off the ground, he says, 
and should draw other businesses. 

Sudhir Sinha, a scientist turned entre- 
preneur, sold one biotechnology company 
in 2008 and was planning to retire when he 
learned about the opportunities at the BioIn- 
novation Center. He rented space in the 
incubator to launch a new business, InnoG- 
enomics, which specializes in techniques for 
extracting forensic information from trace 
amounts of degraded DNA samples found at 
crime scenes. He got the company under way 
with start-up capital, including a $150,000 


Environmental researchers in New Orleans 
and the surrounding area are benefiting from 
the silver lining of a rather large cloud: an 
influx of research dollars after the Deepwater 
Horizon oil spill in the Gulf of Mexico in 2010. 

BP, the operator of the well that spewed 
almost 5 million barrels of oil into the Gulf, 
pledged to provide US$500 million over ten 
years to support research related to the spill 
and its impacts on ecosystems. The company 
established the Gulf of Mexico Research 
Initiative (GoMRI), which so far has awarded 
eight grants to consortia made up of research 
institutions from around the region. 

“With the federal and state budgets 
declining so drastically, it’s one of the few 
sources of external research money right 
now,’ says Nancy Rabalais, director of the 
Louisiana Universities Marine Consortium 
based in Chauvin, which has received one 
of the grants. The disaster brought national 
attention to some of the Gulf region’s 
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Investment after the spill 


pressing long-term problems, such as 
land loss and hypoxia in coastal waters. 
Ultimately, she says, this has produced a shift 
in conservation: from small-scale mitigation 
projects to whole-ecosystem restoration. 

The Louisiana Universities Marine 
Consortium’s GoMRI project focuses 
on assessing the effects of the spill on 
coastal ecosystems and will create 25-30 
positions, says Rabalais. Graduate students, 
postdoctoral researchers and research 
assistants will study marsh degradation and 
the effects of the oil on organisms living in 
shallow waters near the continental shelf. The 
12 institutions comprising the consortium 
will collectively receive $12 million, divided 
among all of them, over three years. 

The demand for researchers is also growing 
in the private sector, says Ralph Portier, 
an environmental scientist at Louisiana 
State University (LSU) in Baton Rouge. 
LSU has just approved a PhD programme 
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in environmental sciences, set to begin in 
autumn 2012, which will focus on research 
related to the oil spill. The university also 
offers a master’s degree in environmental 
sciences, the “scientific equivalent of an 
MBA’, says Portier: companies send policy 
specialists and other employees without 
research backgrounds to develop expertise 
in areas such as soils and coastal restoration. 
“Employees go back to their companies with 
the environmental training needed to function 
as environmental-scientist planners and 
managers,” says Portier. 

The region is a hotspot of pressing 
environmental issues, adds Portier, noting 
its constellation of ecological attributes — 
including the Mississippi River, wetlands 
and agricultural land — and development 
challenges resulting from industrialization 
of oil and gas resources in the Gulf. “There 
are lots of issues, lots of problems and lots of 
opportunities,” he says. A.M. 


grant from the US National Science Foun- 
dation, and received help in developing his 
commercialization plan from graduating 
MBA students and interns at the centre. In 
the past year, InnoGenomics has created 
five full-time lab-research positions; Sinha 
anticipates adding several more in the next 
12-18 months, including jobs for under- 
graduates and postdocs in biochemistry 
and molecular biology. 

Another tenant, energy-technology com- 
pany ReactWell, has taken advantage of 
the centre’s non-profit status, core labora- 
tory facility and access to Tulane’s office of 
technology transfer, says founder Brandon 
Iglesias. In April, Iglesias won $20,000 in 
start-up capital from the Tulane Business 
Plan Competition and the Domain Com- 
panies New Orleans Entrepreneur Chal- 
lenge, to develop ways to use underground 
geothermal reactors to create synthetic 
crude oil from algal biomass. “It’s quite 
scary to be in a start-up because the risk 
is pretty high,” he says. “But I can’t think of 
a more important issue to be working on 
than energy security.’ Iglesias says that New 
Orleans location on the major waterway of 
the Mississippi River, close to the oil and 
gas industry and aquaculture in the Gulf of 
Mexico, gave him ready access to experts 
in his field. 

The city still faces formidable challenges. 
More start-up capital is needed for new 
companies, and researchers at local uni- 
versities aren't necessarily trained to meet 
businesses’ needs. 

“We have these institutions in place to 
help with training, but we just need to make 
sure that the programming they're provid- 
ing matches the needs of the companies that 
were putting together,’ says Miscenich. He 
adds that, for the city to succeed as a bio- 
hub, it must be able to retain talent. Loui- 
siana must maintain its tax incentives and 
business infrastructure, and encourage an 
influx of established pharmaceutical firms 
and CROs to buoy the young companies 
and bring new jobs to the area. 

“We need to come up with a business 
rationale to keep these companies in New 
Orleans,” says Miscenich. “It’s not just going 
to be the good food or the music; it’s going 
to be because it makes economic sense to 
the company.’ m 


Amanda Mascarelli is a freelance science 
writer based in Denver, Colorado. 


CLARIFICATION 

In the Turning Point on Mayim Bialik 
(Nature 483, 669; 2012) the quote 
about Stephen Hawking attributed to 
Kaley Cuoco was said by her character 
Penny during the show. 
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Piled too high 


A passion for science is admirable, but can have 
unwanted outcomes, argues Mariano A. Loza-Coll. 


nindependent film about the pursuit of 
LX: science PhD became a hit last year — 

at least among the fledgling scientists 
that it represents. Thousands flocked to see The 
PhD Movie, based on the hugely popular comic 
strip Piled Higher and Deeper by Jorge Cham, 
a former mechanical engineer, at hundreds of 
on-campus screenings in several countries. Not 
bad for a film produced, directed and acted in 
by graduate students and researchers at the 
California Institute of Technology in Pasadena, 
and funded on a shoe-string budget. 

Why the popularity? The film puts the plight 
of the PhD student on the big screen, giving stu- 
dent audiences a chance to laugh at themselves 
in recognition of the years of schooling, hard 
work and frustration that they are undertaking. 

But I would argue that there is another 
reason: the film tackles some of the nega- 
tive aspects of pursuing a PhD anda science 
career. At its climax, The PhD Movie raises a 
question that crosses many students’ minds: 
why bother? The answer it provides resonates 
with the audience: “Everybody is here because 
they want to be here... You have to embrace 
the things you're passionate about.” 

Yet the passion that drives many scientists to 
investigate the natural and physical world can 
also get them into trouble. It compels them to 
push the boundaries of science, but not always 
with healthful long-term consequences. 

Hard work and a stubborn can-do attitude 
are admirable and rewarding. We put hours, 
days and months into experiments that don’t 
work, because we're passionate about the exhil- 
arating moments when they do. But the trade- 
offs and downsides are many. The romantic 
ideal of the scientist as an independent aca- 
demic investigator uncovering knowledge can 
convince some very smart people to accept 
mediocre pay and delays in starting a family, 
settling down, buying a house and planning 
for retirement — with no guarantees that these 
sacrifices are worthwhile. That is the drama so 
nicely portrayed in The PhD Movie. 

The film could be seen as a warning, showing 
budding researchers that they need to be real- 
istic about their career expectations. But I see it 
as much more. It is a wake-up call for scientists 
and non-scientists alike. After all, it is not scien- 
tists, but the rest of society that may suffer most 
if scientists end up burning out. To ensure that 
scientists can cure diseases, feed the hungry and 
prevent environmental catastrophes, we should 
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select them on the basis of their intelligence, 
effectiveness and focus, not their personal 
sacrifices, obsession and stamina. 

Why, then, isn’t society at large demanding 
changes? I suspect that most people just don't 
know or understand what it takes to advance as 
a scientist. In the interests of science outreach, 
The PhD Movie and similar films should be dis- 
tributed more widely to society at large (it can be 
viewed for a small fee at www.phdmovie.com). 

We can use such films to start a conversa- 
tion about how to nurture the practitioners 
and purveyors of science. With a good dose 
of sarcasm, The PhD Movie shows how gradu- 
ate students spend much of their day second- 
guessing why they would spend another 
minute doing science. Another film, the 
2009 documentary Naturally Obsessed (www. 
naturallyobsessed.com), shows how capable, 
aspiring science graduate students become 
‘drop-outs’ and ‘sell-outs’ for reasons that have 
nothing to do with intelligence and drive. 

Scientists complain about how we are por- 
trayed in popular culture. Maybe it is time to 
start sharing who we are, and what we do. m 


Mariano A. Loza-Coll is a postdoc in genetics 
at the Salk Institute for Biological Studies in 
La Jolla, California. 
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